模型参数的访问、初始化和共享 讨论区


#45

nd.random.uniform(low=5, high=10, out=arr) 这句赋值的,该语句会生成最小值为5,最大值为10的均匀分布,计算出的随机数会被赋值给arr,这样的话,就可以打印出arr.shape了


#46

没有给实例进行初始化。所以每层的参数的shape依次为4-0,4-0,2-0


#47

net[-1]是指当前层的上一层参数,在定义net[2]层时候,net相当于只有两层。就是下面两
net.add(nn.Dense(4, activation=“relu”))
net.add(nn.Dense(4, activation=“relu”))
在运行完这行代码后
net.add(nn.Dense(4, activation=“relu”, params=net[-1].params))
才变成三层。自己测试一下应该能明白。


#48

对于共享权值的更新,应该还是按正常的梯度来计算,每层的梯度依然会自动更新,不会做特殊处理。
因此,相同的权值会更新两次,但是由于更新的梯度保存在相同的地址,所以打印的结果是一样的。
这样理解是否正确?


#49

net[0].params().weight .访问不了


#50

mxnet好像很少别人训练好的大数据模型,对于条件有限的开发者挑战很大,必须要自己重头开始训练,效率比较低下,有没有办法将tensorflow训练好的模型进行转换呢


#51

李沐老师能指导一下吗… 怎么把grad弄出来


#52

model zoo 里提供了很多image classification的模型的预训练参数 https://mxnet.incubator.apache.org/versions/master/api/python/gluon/model_zoo.html


#53

可以在block上找到parameter以后用grad()来访问,比如net.weight.grad()


#54

运行以下代码时产生异常,但是在不共用参数时不会出现错误,代码如下:

from mxnet import gluon
from mxnet import autograd

net = nn.Sequential()
with net.name_scope():
net.add(nn.Dense(15))
net.add(nn.Dense(10))
net.add(nn.Dense(10,params = net[1].params))
net.add(nn.Dense(1))

Number_Example = 200
Number_Input = 2

True_w = [2,5]
True_b = 10

X = nd.random.uniform(shape=(Number_Example,Number_Input))
Y = (X[:,0] ** 2) * True_w[0] + X[:,1] * True_w[1] + True_b
Y += .01 * nd.random.normal(shape=Y.shape)
print(X.shape)
print(Y.shape)

batch_size = 10
dataset = gluon.data.ArrayDataset(X, Y)
data_iter = gluon.data.DataLoader(dataset, batch_size, shuffle=True)

net.initialize()
square_loss = gluon.loss.L2Loss()

print(net.collect_params())
trainer = gluon.Trainer(net.collect_params(), ‘sgd’, {‘learning_rate’: 0.01})

epochs = 20
batch_size = 10
for e in range(epochs):
total_loss = 0
for data, label in data_iter:
with autograd.record():
print(data)
output = net(data)
loss = square_loss(output, label)
loss.backward()
trainer.step(batch_size)
total_loss += nd.sum(loss).asscalar()
print(“Epoch %d, average loss: %f” % (e, total_loss/Number_Example))
#print(“Net[1] grad:”, net[1].weight.grad())
#print(“Net[2] grad:”, net[2].weight.grad())
print(net.collect_params())

异常如下:
MXNetError Traceback (most recent call last)
in ()
38 with autograd.record():
39 print(data)
—> 40 output = net(data)
41 loss = square_loss(output, label)
42 loss.backward()

D:\Anaconda3\envs\gluon\lib\site-packages\mxnet\gluon\block.py in call(self, *args)
358 def call(self, *args):
359 “”“Calls forward. Only accepts positional arguments.”""
–> 360 return self.forward(*args)
361
362 def forward(self, *args):

D:\Anaconda3\envs\gluon\lib\site-packages\mxnet\gluon\nn\basic_layers.py in forward(self, x)
51 def forward(self, x):
52 for block in self._children:
—> 53 x = block(x)
54 return x
55

D:\Anaconda3\envs\gluon\lib\site-packages\mxnet\gluon\block.py in call(self, *args)
358 def call(self, *args):
359 “”“Calls forward. Only accepts positional arguments.”""
–> 360 return self.forward(*args)
361
362 def forward(self, *args):

D:\Anaconda3\envs\gluon\lib\site-packages\mxnet\gluon\block.py in forward(self, x, *args)
573 return self._call_cached_op(x, *args)
574 params = {i: j.data(ctx) for i, j in self._reg_params.items()}
–> 575 return self.hybrid_forward(ndarray, x, *args, **params)
576
577 assert isinstance(x, Symbol), \

D:\Anaconda3\envs\gluon\lib\site-packages\mxnet\gluon\nn\basic_layers.py in hybrid_forward(self, F, x, weight, bias)
204 def hybrid_forward(self, F, x, weight, bias=None):
205 act = F.FullyConnected(x, weight, bias, no_bias=bias is None, num_hidden=self._units,
–> 206 flatten=self._flatten, name=‘fwd’)
207 if self.act is not None:
208 act = self.act(act)

D:\Anaconda3\envs\gluon\lib\site-packages\mxnet\ndarray\register.py in FullyConnected(data, weight, bias, num_hidden, no_bias, flatten, out, name, **kwargs)

D:\Anaconda3\envs\gluon\lib\site-packages\mxnet_ctypes\ndarray.py in _imperative_invoke(handle, ndargs, keys, vals, out)
90 c_str_array(keys),
91 c_str_array([str(s) for s in vals]),
—> 92 ctypes.byref(out_stypes)))
93
94 if original_output is not None:

D:\Anaconda3\envs\gluon\lib\site-packages\mxnet\base.py in check_call(ret)
146 “”"
147 if ret != 0:
–> 148 raise MXNetError(py_str(_LIB.MXGetLastError()))
149
150

MXNetError: Shape inconsistent, Provided = [10,15], inferred shape=(10,10)


#55
# 假设输入维度是k
with net.name_scope():
  net.add(nn.Dense(15)) # w: k * 15 
  net.add(nn.Dense(10)) # w: 15 * 10
  net.add(nn.Dense(10,params = net[1].params)) # w: k * 15
  net.add(nn.Dense(1))

所以为了让第三层可以复用第一层的,这里k必须设置成10


#56

感谢帮助,我天真的以为这里的第一层是net[0]


#57

mli老师能否指导一下,如何查看梯度是哪一层的?


#58

为什么x的shape是(3, 5)但是h1层w的shape是(4,5)呢?我以为应该是(5, 4)的


#59

@mli


#60

会报mxnet.base.MXNetError: Shape inconsistent, Provided = [10,15], inferred shape=(10,10)

这跟k是多少应该没关系吧?


#61

…坑了,我自己的model死活报错,然后拿这个例子也报错。。。


#62

collect_params() 能report gpu上initialize的参数吗? 我这里打印完了以后abort了,感觉是bug,在git报了个issue


#63

我尝试把共享参数用在之前的dropout的代码 (两层256 dense, 拿掉dropout),出现shape错误.请问是怎么回事?shape不是inferred的吗?

MXNetError Traceback (most recent call last)
in ()
53 with autograd.record():
54 print(data.shape)
—> 55 output = net(data)
56
57 loss = softmax_cross_entropy(output, label)

~/.local/lib/python3.6/site-packages/mxnet/gluon/block.py in call(self, *args)
359 def call(self, *args):
360 “”“Calls forward. Only accepts positional arguments.”""
–> 361 return self.forward(*args)
362
363 def forward(self, *args):

~/.local/lib/python3.6/site-packages/mxnet/gluon/nn/basic_layers.py in forward(self, x)
51 def forward(self, x):
52 for block in self._children:
—> 53 x = block(x)
54 return x
55

~/.local/lib/python3.6/site-packages/mxnet/gluon/block.py in call(self, *args)
359 def call(self, *args):
360 “”“Calls forward. Only accepts positional arguments.”""
–> 361 return self.forward(*args)
362
363 def forward(self, *args):

~/.local/lib/python3.6/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
574 return self._call_cached_op(x, *args)
575 params = {i: j.data(ctx) for i, j in self._reg_params.items()}
–> 576 return self.hybrid_forward(ndarray, x, *args, **params)
577
578 assert isinstance(x, Symbol), \

~/.local/lib/python3.6/site-packages/mxnet/gluon/nn/basic_layers.py in hybrid_forward(self, F, x, weight, bias)
204 def hybrid_forward(self, F, x, weight, bias=None):
205 act = F.FullyConnected(x, weight, bias, no_bias=bias is None, num_hidden=self._units,
–> 206 flatten=self._flatten, name=‘fwd’)
207 if self.act is not None:
208 act = self.act(act)

~/.local/lib/python3.6/site-packages/mxnet/ndarray/register.py in FullyConnected(data, weight, bias, num_hidden, no_bias, flatten, out, name, **kwargs)

~/.local/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py in _imperative_invoke(handle, ndargs, keys, vals, out)
90 c_str_array(keys),
91 c_str_array([str(s) for s in vals]),
—> 92 ctypes.byref(out_stypes)))
93
94 if original_output is not None:

~/.local/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
147 “”"
148 if ret != 0:
–> 149 raise MXNetError(py_str(_LIB.MXGetLastError()))
150
151

MXNetError: Shape inconsistent, Provided = [256,784], inferred shape=(256,256)


#64
def step(self, batch_size, ignore_stale_grad=False):
    """Makes one step of parameter update. Should be called after
    `autograd.compute_gradient` and outside of `record()` scope.
    """
    for upd, arr, grad in zip(self._updaters, param.list_data(), param.list_grad()):
        if not ignore_stale_grad or arr._fresh_grad:
            upd(i, grad, arr)
            arr._fresh_grad = False

看了下trainer的源码, 每次更新后会把weight动态增加一个属性_fresh_grad = False, 当权值共享时,第二次更新同一组weight时,检查到该weight不fresh了,就不更新了。
所以没错,只更新一次。