怎么获得每一层反向求导的梯度值和特征值

怎么获得每一层反向求导的梯度值和特征值,使用mxnet.sym搭建的神经网络

(1)一般如果通过autograd记录即可
示例

from mxnet import autograd, nd

x = [1, 2, 3, 4]
x = nd.array(x)
x = x.reshape((4, 1))
x.attach_grad()
with autograd.record():
    y = 2 * nd.dot(x.T, x)
y.backward()
print(x.grad)
Output:
[[ 4.]
 [ 8.]
 [12.]
 [16.]]
<NDArray 4x1 @cpu(0)>

(2)如果是模型,通过nn.Sequential()或者nn.HbridSequential()构建的模型可以使用下标[i]获取每一层的权重

from mxnet import nd
from mxnet.gluon import nn

net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'))
net.add(nn.Dense(10))
net.initialize()

X = nd.random.uniform(shape=(2, 20))
Y = net(X)

print('net0 (0 layer): \n params:\n', net[0].params, '\n params type:\n', type(net[0].params))

print('dense0_weight:\n', net[0].params['dense0_weight'], '\n weight:\n', net[0].weight)

print('weight data:\n', net[0].weight.data())

print('weight grad:\n', net[0].weight.grad())

print('dense0:\n', net[0])

print('net1 (1 layer): \n params:\n', net[1].params)

print('bias data:\n', net[1].bias.data())

print('net all params:\n', net.collect_params())

print('Normalization:\n', net.collect_params('.*weight'))
Output:
net0 (0 layer):
params:
 dense0_ (
  Parameter dense0_weight (shape=(256, 20), dtype=float32)
  Parameter dense0_bias (shape=(256,), dtype=float32)
)

params type:
 <class 'mxnet.gluon.parameter.ParameterDict'>

dense0_weight:
 Parameter dense0_weight (shape=(256, 20), dtype=float32)

weight:
 Parameter dense0_weight (shape=(256, 20), dtype=float32)

weight data:
[[ 0.06700657 -0.00369488  0.0418822  ... -0.05517294 -0.01194733
  -0.00369594]
 [-0.03296221 -0.04391347  0.03839272 ...  0.05636378  0.02545484
  -0.007007  ]
 [-0.0196689   0.01582889 -0.00881553 ...  0.01509629 -0.01908049
  -0.02449339]
 ...
 [ 0.00010955  0.0439323  -0.04911506 ...  0.06975312  0.0449558
  -0.03283203]
 [ 0.04106557  0.05671307 -0.00066976 ...  0.06387014 -0.01292654
   0.00974177]
 [ 0.00297424 -0.0281784  -0.06881659 ... -0.04047417  0.00457048
   0.05696651]]
<NDArray 256x20 @cpu(0)>

weight grad:
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
<NDArray 256x20 @cpu(0)>

dense0:
 Dense(20 -> 256, Activation(relu))

net1 (1 layer):
params:
 dense1_ (
  Parameter dense1_weight (shape=(10, 256), dtype=float32)
  Parameter dense1_bias (shape=(10,), dtype=float32)
)

bias data:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
<NDArray 10 @cpu(0)>

net all params:
sequential0_ (
  Parameter dense0_weight (shape=(256, 20), dtype=float32)
  Parameter dense0_bias (shape=(256,), dtype=float32)
  Parameter dense1_weight (shape=(10, 256), dtype=float32)
  Parameter dense1_bias (shape=(10,), dtype=float32)
)

Normalization:
 sequential0_ (
  Parameter dense0_weight (shape=(256, 20), dtype=float32)
  Parameter dense1_weight (shape=(10, 256), dtype=float32)
)

(3)如果通过class进行封装:
例如:

class Network(nn.Block):
    def __init__(self, **kwargs):
        super(Network, self).__init__(**kwargs)
        self.conv = nn.Con2D(...)
        self.stage = nn.Sequential()
        ....

    def forward(self, x):
       ...
      return out

那么可以通过以下方法获取权重:

params = net.collect_params('.*weight|.*bias')
for param_name in parmas:
      w =params[param_name].data()   # 获取权重

(4)最后在反向传播后更新梯度后,通过上述方法先拿到对应层的权重w, 再获取其梯度

with autograd.record():
      y_hat = net(x)
      l = loss(y_hat,y)
l.backward()  # 反向传播

# 上述方法获取模型层的权重
w = ...

# 获取梯度
print(w.grad)