全卷积网络(FCN)讨论区


#48

我发现在另一台cu80的电脑上,这里没问题。。。不过也许是因为那台电脑是英文路径?求助。。。


#49

可以不用管。这是编译机器上的路径

感觉错误是

C:\Users\自宅用/.mxnet/models\resnet18_v2-8aacf80f.params

后面用的是unix分隔符 @szha


#50

orz我来修


#51

Fix在 https://github.com/apache/incubator-mxnet/pull/9352
merge以后会加进release的


#52

跑教程的FCN代码,出现mxnet/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered


#56

谢谢!试了一下貌似不行,github还不会用,看了下更新的内容直接把E:\Miniconda\envs\gluon\Lib\site-packages\mxnet文件夹换了。

似乎错误变了:

MXNetError Traceback (most recent call last)
in ()
1 from mxnet.gluon.model_zoo import vision as modelss
----> 2 preta_net = modelss.alexnet(pretrained=True)

E:\Miniconda\envs\gluon\lib\site-packages\mxnet\gluon\model_zoo\vision\alexnet.py in alexnet(pretrained, ctx, root, **kwargs)
84 if pretrained:
85 from …model_store import get_model_file
—> 86 net.load_params(get_model_file(‘alexnet’, root=root), ctx=ctx)
87 return net

E:\Miniconda\envs\gluon\lib\site-packages\mxnet\gluon\block.py in load_params(self, filename, ctx, allow_missing, ignore_extra)
260 “”"
261 self.collect_params().load(filename, ctx, allow_missing, ignore_extra,
–> 262 self.prefix)
263
264

E:\Miniconda\envs\gluon\lib\site-packages\mxnet\gluon\parameter.py in load(self, filename, ctx, allow_missing, ignore_extra, restore_prefix)
662 lprefix = len(restore_prefix)
663 loaded = [(k[4:] if k.startswith(‘arg:’) or k.startswith(‘aux:’) else k, v)
–> 664 for k, v in ndarray.load(filename).items()]
665 arg_dict = {restore_prefix+k: v for k, v in loaded}
666 if not allow_missing:

E:\Miniconda\envs\gluon\lib\site-packages\mxnet\ndarray\utils.py in load(fname)
173 ctypes.byref(handles),
174 ctypes.byref(out_name_size),
–> 175 ctypes.byref(names)))
176 if out_name_size.value == 0:
177 return [_ndarray_cls(NDArrayHandle(handles[i])) for i in range(out_size.value)]

E:\Miniconda\envs\gluon\lib\site-packages\mxnet\base.py in check_call(ret)
144 “”"
145 if ret != 0:
–> 146 raise MXNetError(py_str(_LIB.MXGetLastError()))
147
148

MXNetError: [21:22:02] C:\projects\mxnet-distro-win\mxnet-build\dmlc-core\src\io\local_filesys.cc:166: Check failed: allow_null LocalFileSystem: fail to open “C:\Users\自宅用.mxnet\models\alexnet-44335d1f.params”

没少斜杠,刚才不知道为什么复制漏了,所以删了。

检查了一下,关键的alexnet.py里
def alexnet(pretrained=False, ctx=cpu(),
root=os.path.join(’~’, ‘.mxnet’, ‘models’), **kwargs):
这个定义确实改过来了
而且windows里的地址也确实是:
C:\Users\自宅用.mxnet\models\alexnet-44335d1f.params,好像还有别的问题。。。


#57

。。。。。好神奇,在输入框时自宅用后面那个那个斜杠还在,发送出来就没了?
C:\Users\自宅用.mxnet\models
和这个问题有关系吗。。。


#58

正确的应该是 @szha

C:\Users\自宅用\.mxnet\models\alexnet-44335d1f.params

#59

可以试试只用gpu(0)


#60

看图里

把选择的部分复制出来:
C:\Users\自宅用.mxnet\models\alexnet-44335d1f.params
在输入框里还一样,但是发送出来“自宅用”后面那个斜杠就消失了


#61

貌似就是中文路径不行,root换成另一个英文路径可以,换成别的中文路径又显示那个错误了


#62

我想把最后训练的参数保存下来:
params_name = "fcn.params"
net.save_params(params_name)
但是会报错:
File “fcn.py”, line 206, in
net.save_params(params_name)
File “/home/yhl/Documents/Workspace/mxnet/mxnet0120/python/mxnet/gluon/block.py”, line 244, in save_params
self.collect_params().save(filename, strip_prefix=self.prefix)
File “/home/yhl/Documents/Workspace/mxnet/mxnet0120/python/mxnet/gluon/parameter.py”, line 591, in save
strip_prefix, param.name, strip_prefix))
ValueError: Prefix hybridsequential0_ is to be striped before saving, but Parameter resnetv20_batchnorm0_gamma does not start with hybridsequential0_. If you are using Block.save_params, This may be due to your Block shares parameters from other Blocks or you forgot to use with name_scope() during init. Consider switching to Block.collect_params.save and Block.collect_params.load instead.

这种应该怎么把整个网络存下来?


#63

应该是没有加 name_scope(), 例如

net = nn.HybridSequential()
for layer in pretrained_net.features[:-2]:
    with net.name_scope():
        net.add(layer)

#64

好的,多谢,我再试一下!


#65

[15:12:46] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [15:12:46] src/operator/./deconvolution-inl.h:443: Check failed: param_.workspace >= required_size (134217728 vs. 206488800)
Minimum workspace size: 825955200 Bytes
Given: 536870912

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x272c4c) [0x7fb6ed6c9c4c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23cafb9) [0x7fb6ef821fb9]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23cebff) [0x7fb6ef825bff]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2302bf4) [0x7fb6ef759bf4]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x214e430) [0x7fb6ef5a5430]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x20cd51d) [0x7fb6ef52451d]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x20d1551) [0x7fb6ef528551]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x5e3e270) [0x7fb6f3295270]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fb7155926ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fb7152c83dd]

[15:12:46] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [15:12:46] src/engine/./threaded_engine.h:370: [15:12:46] src/operator/./deconvolution-inl.h:443: Check failed: param_.workspace >= required_size (134217728 vs. 206488800)
Minimum workspace size: 825955200 Bytes
Given: 536870912

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x272c4c) [0x7fb6ed6c9c4c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23cafb9) [0x7fb6ef821fb9]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23cebff) [0x7fb6ef825bff]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2302bf4) [0x7fb6ef759bf4]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x214e430) [0x7fb6ef5a5430]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x20cd51d) [0x7fb6ef52451d]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x20d1551) [0x7fb6ef528551]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x5e3e270) [0x7fb6f3295270]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fb7155926ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fb7152c83dd]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x272c4c) [0x7fb6ed6c9c4c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x20cd7c4) [0x7fb6ef5247c4]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x20d1551) [0x7fb6ef528551]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x5e3e270) [0x7fb6f3295270]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fb7155926ba]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fb7152c83dd]

terminate called after throwing an instance of 'dmlc::Error’
what(): [15:12:46] src/engine/./threaded_engine.h:370: [15:12:46] src/operator/./deconvolution-inl.h:443: Check failed: param_.workspace >= required_size (134217728 vs. 206488800)
Minimum workspace size: 825955200 Bytes
Given: 536870912

Stack trace returned 10 entries:

GPU: GTX1080(8 g)
设置batch_size = 1 还是上面的错误,难道是resnet18内存不够吗? 有知道的小伙伴解答一下疑问。谢谢


#66

这个方法我试过了,还是相同的错误,报错看好像是参数名字有不对。


#67

做了一个小的数据集想看看结果如果,但是却得到test_acc :nan不知道什么原因
('Start training on ', gpu(0))
Epoch 0. Loss: 0.061, Train acc 0.98, Test acc nan, Time 1.4 sec
Epoch 1. Loss: 0.052, Train acc 0.98, Test acc nan, Time 1.3 sec
Epoch 2. Loss: 0.064, Train acc 0.97, Test acc nan, Time 1.3 sec
Epoch 3. Loss: 0.046, Train acc 0.98, Test acc nan, Time 1.3 sec
Epoch 4. Loss: 0.068, Train acc 0.97, Test acc nan, Time 1.3 sec
Epoch 5. Loss: 0.043, Train acc 0.98, Test acc nan, Time 1.3 sec
Epoch 6. Loss: 0.068, Train acc 0.97, Test acc nan, Time 1.4 sec
Epoch 7. Loss: 0.057, Train acc 0.98, Test acc nan, Time 1.4 sec
Epoch 8. Loss: 0.052, Train acc 0.98, Test acc nan, Time 1.3 sec
Epoch 9. Loss: 0.048, Train acc 0.98, Test acc nan, Time 1.3 sec


#68

不太明白,为什么要使用自定义函数bilinear_kernel来对conv_trans进行初始化,而不是使用mx.init.Bilinear()这种方式。@mli


#69

同样的问题,你解决了吗


#70

请问一下,论文里面写conv6-7过后特征图大小缩小为原图的1/32,可以通过上采样32倍得到和原图同样大小的输出图。但是,conv6的卷积核是77的,那输入2242243大小的图片,输出不是应该是11*21吗?刚开始学习,对这一部分不太理解,谢谢解答!