使用GPU训练的问题

以下的讨论是基于:
MXNet版本: 1.2
操作系统: ubuntu
在GPU上训练CNN,出现这个错:
File “/data_1/renzhen/KesciBDC2018/BDC2018/gluon_utils/init.py”, line 67, in gluon_train
outputs = [net(X) for X in data]
File “/data_1/renzhen/KesciBDC2018/BDC2018/gluon_utils/init.py”, line 67, in
outputs = [net(X) for X in data]
File “/data_1/renzhen/anaconda3/lib/python3.6/site-packages/mxnet/gluon/block.py”, line 413, in call
return self.forward(*args)
File “/data_1/renzhen/anaconda3/lib/python3.6/site-packages/mxnet/gluon/block.py”, line 627, in forward
return self._call_cached_op(x, *args)
File “/data_1/renzhen/anaconda3/lib/python3.6/site-packages/mxnet/gluon/block.py”, line 528, in _call_cached_op
out = self._cached_op(*cargs)
File “/data_1/renzhen/anaconda3/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py”, line 149, in call
ctypes.byref(out_stypes)))
File “/data_1/renzhen/anaconda3/lib/python3.6/site-packages/mxnet/base.py”, line 149, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [16:27:05] src/imperative/cached_op.cc:361: Check failed: inputs[i]->ctx() == default_ctx (cpu(0) vs. gpu(0)) CachedOp requires all inputs to live on the same context. But data is on gpu(0) while conv0_weight is on cpu(0)

貌似是data在gpu上,权重在cpu上。如何让这两个都在gpu上?

net.collect_params().reset_ctx(mx.gpu(0))试一下?

2赞

嗯。是可以的,每次必须要这么写么?有没有类似的这样的操作:比如在声明一个HybridSequential时指定ctx呢?

啊。我找到方法了,initialize 有个参数。。。多谢了

dalao能说一下具体怎么一开始避免这个问题吗?

如果是一个mxnet的NDArray呢?怎么给它放到相同环境