MXNet中如何实现iter_size这样一个功能


#1

由于batch_size太小了,所以想实现caffe中iter_size这样一个功能,就是处理iter_size * batch_size个样本以后在更新网络参数。由于对loss.backward()trainer.step这两个函数的底层是如何实现的不太清楚,所以想请教一下


#2

可能需要每个batch_size都要将参数的梯度保存下来,然后最后做完iter_size * batch_size之后再手动更新了。


#3

好的,谢谢


#4

mxnet支持累积gradient的。
比如

for p in net.collect_params():
    p.grad_req = 'add'

for i in range(100):
    net.collect_params().zero_grads()
    for j in range(iter_size):
        y = net(data)
        y.backward()
    trainer.step()

#5

这种方法在 net.hybridize() 之后就会有问题,梯度还是会被覆盖

import mxnet as mx
from mxnet import autograd as ag, gluon as gl

data = mx.nd.random.uniform(shape=(1,3,224,224), dtype='float32')
label = mx.nd.random.uniform(shape=(1), dtype='float32')
label[:] = 1
loss = gl.loss.SoftmaxCrossEntropyLoss()

net = gl.model_zoo.vision.resnet18_v1()
net.initialize()

#net.hybridize()

for v in net.collect_params().values():
    v.grad_req = 'add'

#net.hybridize()
    
net.collect_params().zero_grad()
with ag.record():
    pred = net(data)
    l = loss(pred, label)
    l.backward()
print(net.features[0].weight.grad()[0,0,0,0])

net.collect_params().zero_grad()
with ag.record():
    pred = net(data)
    l = loss(pred, label)
    l.backward()
    pred = net(data)
    l = loss(pred, label)
    l.backward()
print(net.features[0].weight.grad()[0,0,0,0])

测试代码如上,mxnet 版本为 mxnet-cu80==1.2.0b20180403


#6

@piiswrong @szha 求助一下 iter_size 可行的实现


#7

感觉是有点奇怪。用你的代码我的确能reproduce出grad_req的问题,而且好像只有比较复杂的网络才有这个问题。我用一下的script的话req就没问题:

from mxnet import autograd as ag, gluon as gl

data = mx.nd.random.uniform(shape=(1,3,224,224), dtype='float32')
label = mx.nd.random.uniform(shape=(1), dtype='float32')
label[:] = 1
loss = gl.loss.SoftmaxCrossEntropyLoss()

#net = gl.model_zoo.vision.resnet18_v1()
net = gl.nn.HybridSequential()
net1 = gl.nn.HybridSequential()
net1.add(gl.nn.Dense(2))
net2 = gl.nn.HybridSequential()
net2.add(gl.nn.Dense(2))
net.add(net1)
net.add(net2)
net.initialize()

net.hybridize()

for v in net.collect_params().values():
    v.grad_req = 'add'

#net.hybridize()
print(net)

net.collect_params().zero_grad()
with ag.record():
    pred = net(data)
    l = loss(pred, label)
    l.backward()
#print(net.features[0].weight.grad()[0,0,0,0])
print(net[0][0].weight.grad().mean())

net.collect_params().zero_grad()
with ag.record():
    pred = net(data)
    l = loss(pred, label)
    l.backward()
    pred = net(data)
    l = loss(pred, label)
    l.backward()
#print(net.features[0].weight.grad()[0,0,0,0])
print(net[0][0].weight.grad().mean())
HybridSequential(
  (0): HybridSequential(
    (0): Dense(None -> 2, linear)
  )
  (1): HybridSequential(
    (0): Dense(None -> 2, linear)
  )
)

[0.00607673]
<NDArray 1 @cpu(0)>

[0.01215345]
<NDArray 1 @cpu(0)>

#8

bug :joy:


#9

找到了后端hybridize时的cached op没有读前端给的req,最近应该能PR出fix


#10