使用重复元素的网络(VGG) 讨论区


#62

我按照vgg的论文写了个vgg16的网络,在没有对utils.load_data_fashion_mnist函数进行修改时运行正常,但修改后,把resize去掉或者换成resize=224就会报错?


#63

各位老师你们好,我想请教一个问题。 我写了一个从零开始的VGG-11模型。和VGG-gluon的架构一致。
只是在自定义参数时候的weight_scle =0.01 这与gluon Xavier.init不一致。 后来用SGD进行优化时候。选择的lr=[0.01, 0.1, 0.5,] 跑出的结果test_acc = 0.1 左右。请问这个是怎么回事? 是和参数初始化有关系吗?


#64

运行教程代码报错了:

Start training on  gpu(0)
---------------------------------------------------------------------------
MXNetError                                Traceback (most recent call last)
<ipython-input-5-7415ec11787d> in <module>()
     15                         'sgd', {'learning_rate': 0.05})
     16 utils.train(train_data, test_data, net, loss,
---> 17             trainer, ctx, num_epochs=5)

~/LearnAndWork/Workspace/gluon-tutorials/utils.py in train(train_data, test_data, net, loss, trainer, ctx, num_epochs, print_batches)
    140                 l.backward()
    141             train_acc += sum([(yhat.argmax(axis=1)==y).sum().asscalar()
--> 142                               for yhat, y in zip(outputs, label)])
    143             train_loss += sum([l.sum().asscalar() for l in losses])
    144             trainer.step(batch_size)

~/LearnAndWork/Workspace/gluon-tutorials/utils.py in <listcomp>(.0)
    140                 l.backward()
    141             train_acc += sum([(yhat.argmax(axis=1)==y).sum().asscalar()
--> 142                               for yhat, y in zip(outputs, label)])
    143             train_loss += sum([l.sum().asscalar() for l in losses])
    144             trainer.step(batch_size)

~/.local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in asscalar(self)
   1906         if self.shape != (1,):
   1907             raise ValueError("The current array is not a scalar")
-> 1908         return self.asnumpy()[0]
   1909 
   1910     def astype(self, dtype, copy=True):

~/.local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in asnumpy(self)
   1888             self.handle,
   1889             data.ctypes.data_as(ctypes.c_void_p),
-> 1890             ctypes.c_size_t(data.size)))
   1891         return data
   1892 

~/.local/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
    147     """
    148     if ret != 0:
--> 149         raise MXNetError(py_str(_LIB.MXGetLastError()))
    150 
    151 

MXNetError: [21:58:58] src/operator/nn/./cudnn/cudnn_convolution-inl.h:744: Failed to find any forward convolution algorithm.

Stack trace returned 10 entries:
[bt] (0) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x31f712) [0x7fc1b0b49712]
[bt] (1) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x31fce8) [0x7fc1b0b49ce8]
[bt] (2) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3176d28) [0x7fc1b39a0d28]
[bt] (3) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3178435) [0x7fc1b39a2435]
[bt] (4) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x31794e6) [0x7fc1b39a34e6]
[bt] (5) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3182811) [0x7fc1b39ac811]
[bt] (6) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2ad9806) [0x7fc1b3303806]
[bt] (7) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a54c03) [0x7fc1b327ec03]
[bt] (8) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a5d004) [0x7fc1b3287004]
[bt] (9) /home/gaunthan/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a60cab) [0x7fc1b328acab]

哪位兄台也遇到到这个问题的?可否分享下你的解决方案:smiley:


#65

和你的报错信息不一样,但是位置一样,感觉是类似的问题,vgg模型较大,试着减小点batchsize?


#66

为什么我手算 blk = vgg_block(2, 128)
blk.initialize()
x = nd.random.uniform(shape=(2,3,16,16))
y = blk(x)
print(y.shape) 与输出结果不符(2,128,8,8) 我算的是(2,128,7,7)


#67

知道哪里错了


#68

这里好像是笔误?应该是前两块使用单卷积层,后三块使用双卷积层吧。:slightly_smiling_face:


#69

改好了,谢谢指正


#70

我想打印一下模型参数


为什么是从8、9层开始呀?
而且shape与卷积层的size也对不上


#71

知道了,前面运行过一次initialize()


#72

网络初始化问题
net.initialize 时,其实网络输入是未知的,可能是1通道图像,也可能是3channels,那么第一层的卷积核通道数也是不确定的。初始化的机制具体是怎样的?


#75

执行net.initialize函数然后看一下net就可以了.比如第一层: Sequential( (0): Conv2D(None -> 96, kernel_size=(11, 11), stride=(4, 4), Activation(relu)).也就是说,卷积核的通道数(更确切地说是输入的通道数)赋值为None,也就是说是可以为任意值.


#76

resize 224->96 训练耗时减少,收敛更快:

224
epoch 1, loss 0.8487, train acc 0.692, test acc 0.833, time 168.0 sec
epoch 2, loss 0.4113, train acc 0.849, test acc 0.881, time 160.4 sec
epoch 3, loss 0.3361, train acc 0.877, test acc 0.899, time 160.8 sec

96
epoch 1, loss 1.1234, train acc 0.580, test acc 0.804, time 41.8 sec
epoch 2, loss 0.5098, train acc 0.812, test acc 0.850, time 39.5 sec
epoch 3, loss 0.4051, train acc 0.851, test acc 0.873, time 39.7 sec
epoch 4, loss 0.3524, train acc 0.871, test acc 0.875, time 39.8 sec
epoch 5, loss 0.3169, train acc 0.884, test acc 0.893, time 39.9 sec

#77

gluon现在支持Intel gpu吗?


#78

输入的宽高都减小了2倍(2x2=4),通道数增加了2倍,这样复杂度还是相当的吗?不理解,求大佬解释。