autograd的问题

在使用Mxnet的autograd的时候,遇到了这样一个问题:

import mxnet as mx
import numpy as np


a = mx.nd.array(np.random.random([3, 10]))
a.attach_grad()
b = mx.nd.array(np.random.random([3, 10]))
b.attach_grad()

class Net(mx.gluon.nn.Block):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.dense = mx.gluon.nn.Dense(10)
    def forward(self, x):
        x = self.dense(x)
        return x

net = Net()
net.initialize()

losses = []
with mx.autograd.record():
    pa = net(a)
    pb = net(b)
    for i in range(3):
        losses.append(mx.nd.sum(pa[i] * pb[i]))

for l in losses:
    l.backward()

这样的话,代码会出问题, 然后把:下面的3换成1就没有问题了:

losses = []
with mx.autograd.record():
    pa = net(a)
    pb = net(b)
    for i in range(1):
        losses.append(mx.nd.sum(pa[i] * pb[i]))

for l in losses:
    l.backward()

print a.grad

我想问一下这种情况下,应该怎么去计算这个loss。就是我可能每个batch的不同的样本计算的loss计算方式不一样,所以要拆开算。

相当于你的数据中每个batch的样本loss计算都是不同的,那么需要单独一个一个算

a = mx.nd.array(np.random.random([3, 10]))
a.attach_grad()
b = mx.nd.array(np.random.random([3, 10]))
b.attach_grad()

class Net(mx.gluon.nn.Block):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.dense = mx.gluon.nn.Dense(10)
    def forward(self, x):
        x = self.dense(x)
        return x

net = Net()
net.initialize()

losses = []
with mx.autograd.record():
    for i in range(3):
        pa = net(a[i])
        pb = net(b[i])
        losses.append(mx.nd.sum(pa * pb))

for l in losses:
    l.backward()

另外,你会发现a.grad只有最后一个有,这是因为每次backward的时候都会将ndarray的梯度归零,所以上面这个样子相当于前面两个样本并没有起作用,是白做的,所以需要修改一下

a = mx.nd.array(np.random.random([3, 10]))
a.attach_grad(grad_req='add')
b = mx.nd.array(np.random.random([3, 10]))
b.attach_grad(grad_req='add')

class Net(mx.gluon.nn.Block):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.dense = mx.gluon.nn.Dense(10)
        self.dense.weight.grad_req = 'add'
        self.dense.bias.grad_req = 'add'
    def forward(self, x):
        x = self.dense(x)
        return x

net = Net()
net.initialize()

losses = []
with mx.autograd.record():
    for i in range(3):
        pa = net(a[i])
        pb = net(b[i])
        losses.append(mx.nd.sum(pa * pb))

for l in losses:
    l.backward()

这里为什么要a.attach_grad()?这个不是输入数据吗?

他的原问题里面加了的,我也很奇怪为什么输入要算梯度

class triplet_loss(Loss):
    def __init__(self, weight=1., batch_axis=0, **kwargs):
        super(triplet_loss, self).__init__(weight, batch_axis, **kwargs)
        # self.cnt_flag = 0

    def hybrid_forward(self, F, output):
        # self.cnt_flag += 1
        output = nd.L2Normalization(data=output, mode='instance')
        basic_loss = 0
        cfg.cnt_triplets_per_iter = 0
        # print output.grad
        for i in range(cfg.nums_lifes_per_classID):
            for j in range(cfg.nums_classID_per_device * len(cfg.gpus)):
                class_start_idx = j * cfg.nums_lifes_per_classID
                anchor_idx = j * cfg.nums_lifes_per_classID + i
                anchor = output[anchor_idx]

                # get pos embeddings
                if i == 0:
                    pos = output[anchor_idx + 1:class_start_idx + cfg.nums_lifes_per_classID]
                elif i > 0 and i < cfg.nums_lifes_per_classID - 1:
                    pos = nd.concat(output[class_start_idx:anchor_idx],
                                    output[anchor_idx + 1: class_start_idx + cfg.nums_lifes_per_classID], dim=0)
                elif i == cfg.nums_lifes_per_classID - 1:
                    pos = output[class_start_idx:class_start_idx + cfg.nums_lifes_per_classID - 1]

                # broadcast anchor to [pos.shape[0], pos.shape[1]]
                # anchor_pos = anchor.broadcast_to((cfg.nums_lifes_per_classID - 1, pos.shape[1]))
                # calculate pos distance
                pos_dist = nd.sum(nd.square(anchor.__sub__(pos)), 1)
                # print pos_dist

                # get neg embeddings
                if j == 0:
                    neg = output[class_start_idx + cfg.nums_lifes_per_classID:cfg.batch_size_per_device * len(cfg.gpus)]
                elif j > 0 and j < cfg.nums_classID_per_device - 1:
                    neg = nd.concat(output[0:class_start_idx], output[
                                                               class_start_idx + cfg.nums_lifes_per_classID:cfg.batch_size_per_device * len(
                                                                   cfg.gpus)], dim=0)
                elif j == cfg.nums_classID_per_device - 1:
                    neg = output[0:cfg.batch_size_per_device * len(cfg.gpus) - cfg.nums_lifes_per_classID]
                # broadcast anchor to [neg.shape[0], neg.shape[1]]
                # anchor_neg = anchor.broadcast_to((neg.shape[0], neg.shape[1]))

                # calculate neg distance
                neg_dist = nd.sum(nd.square(anchor.__sub__(neg)), 1)
                # print neg_dist__
                for k in range(pos_dist.shape[0]):
                    select_vec = (neg_dist - pos_dist[k] - cfg.alpha).asnumpy()
                    for p in range(len(select_vec)):
                        if select_vec[p] < 0:
                            cfg.cnt_triplets_per_iter += 1
                            basic_loss = basic_loss + nd.mean(
                                nd.maximum((pos_dist[k].__sub__(neg_dist[p])).__add__(cfg.alpha), 0.))
                            break
        return basic_loss / cfg.cnt_triplets_per_iter
            with mxnet.gluon.autograd.record():
                for i in range(len(b)):
                    batch = nd.array(b[i], ctx=gpu(cfg.gpus[i]))
                    feats = net(batch)[0]
                loss_ = loss(feats)
            loss_.backward()
            print feats.grad
            trainer.step(cfg.batch_size)

我之前写了个tripelt loss,好像不work,下面这段代码的bach和feats应该都不需要attach_grad()吧?但是打印feats.grad返回None,加上feats.attach_grad()后就能打印了?

我没有仔细看triploss的代码,但是看autograd.record感觉只把最后一次的feats拿来算了梯度,前面计算的feats都没用到

len(b)为1就没问题吧

Hi,谢谢你的回复哈。这样做是没有问题的哈。
我之前的想法其实是计算pa和pb时,因为计算一个batch可以节省计算量。
想问下这样计算一个batch和计算一个样本再计算有什么不同。
谢谢🙏

另外:
这样是work的

losses = []
aa = []
bb = []
with mx.autograd.record():
    for i in range(3):
        aa.append(net(a[i]))
        bb.append(net(b[i]))
    losses.append(mx.nd.sum(mx.nd.square(aa[0] - bb[0])))
    losses.append(mx.nd.sum(mx.nd.square(aa[2] - bb[1])))
    losses.append(mx.nd.sum(mx.nd.square(aa[1] - bb[2])))
for l in losses:
    l.backward()
print a.grad

但是这样就是不work的了:

losses = []
aa = []
bb = []
with mx.autograd.record():
    for i in range(3):
        aa.append(net(a[i]))
        bb.append(net(b[i]))
    losses.append(mx.nd.sum(mx.nd.square(aa[0] - bb[0])))
    losses.append(mx.nd.sum(mx.nd.square(aa[2] - bb[1])))
    losses.append(mx.nd.sum(mx.nd.square(aa[1] - bb[1])))
for l in losses:
    l.backward()
print a.grad

这是为什么呢~

下面这个不work是因为在第二步计算的时候用了bb[1],相当于bb[1]已经在第二步构建的计算图当中,你又把他放在第三个图里面,就不行了,你看上面那一个,他们各自都在自己的计算图当中

这里不一定需要用add,可以
autograd.backward(losses)
这样更快

这样可以对一个list的loss进行反向传播,然后梯度累加吗?

请问下我的那个triplet loss里是不是不能写for循环,因为我把pos.attach_grad()后打印梯度,好像只有最后一次循环的梯度。
还有就是attach_grad()是不是只是说让这个ndarray的梯度能打印,因为我看gluon官方教程feat=net(x)后并没有attach_grad。

那对一个list的NDArray进行backward得到的结果是什么呢? 求得的梯度累加?

对,相当于所有loss相加

你这回传的是loss,为什么不是回传梯度