残差网络(ResNet) 讨论区


#62

为什么步长为2 ,1x1的卷积层不会损失信息,不是相当于有的位置没有用到吗


#63

这里nn.AvgPool2D 就是非线性的平滑了,不用再加BN和activation function了吧


#64

视频里找吧,或者github找到以前的commit 看当时的代码


#65

conv3 本来就是部分保存原来信息,你这先来一步BN+activation, 原始信息丢失的更多了,我感觉是这样


#66

按照 作业修改了一下残差网络的结构

class Residual(nn.Block):
    def __init__(self,num_channels,use_1x1conv=False,strides=1,**kwargs):
        super(Residual,self).__init__(**kwargs)
        self.conv1=nn.Conv2D(num_channels,kernel_size=3,padding=1,strides=strides)
        self.conv2=nn.Conv2D(num_channels,kernel_size=3,padding=1)
        
        if use_1x1conv:
            self.conv3=nn.Conv2D(num_channels,kernel_size=1,strides=strides)
        else:
            self.conv3=None
        
        self.bn1=nn.BatchNorm()
        self.bn2=nn.BatchNorm()
    
    def forward(self,X):
        # “卷积、批量归一化和激活”结构改成了“批量归一化、激活和卷积”
        Y= self.conv1(nd.relu(self.bn1(X)))
        Y=self.conv2(nd.relu(self.bn2(Y)))
        if self.conv3:
            X=self.conv3(X)
        return Y+X

然后我发现需要修改一下学习率,不然损失会消失

lr, num_epochs, batch_size, ctx = 0.001, 5, 512, d2l.try_gpu()
net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': lr})
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size,resize=28)
d2l.train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx,
              num_epochs)

结果

training on gpu(0)
epoch 1, loss 0.6878, train acc 0.808, test acc 0.846, time 20.6 sec
epoch 2, loss 0.3005, train acc 0.889, test acc 0.849, time 21.0 sec
epoch 3, loss 0.2503, train acc 0.907, test acc 0.900, time 20.7 sec
epoch 4, loss 0.2164, train acc 0.920, test acc 0.883, time 20.9 sec
epoch 5, loss 0.1893, train acc 0.929, test acc 0.898, time 20.8 sec

比我28输入的原来的网络要好一丢丢

lr, num_epochs, batch_size, ctx = 0.05, 5, 256, d2l.try_gpu()
net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr})
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx,
              num_epochs)

结果

training on gpu(0)
epoch 1, loss 0.6228, train acc 0.801, test acc 0.850, time 22.9 sec
epoch 2, loss 0.3153, train acc 0.882, test acc 0.885, time 20.1 sec
epoch 3, loss 0.2649, train acc 0.901, test acc 0.890, time 20.1 sec
epoch 4, loss 0.2316, train acc 0.913, test acc 0.891, time 20.7 sec
epoch 5, loss 0.2081, train acc 0.921, test acc 0.872, time 20.6 sec

附上学习率略大之后的结果

lr, num_epochs, batch_size, ctx = 0.05, 5, 512, d2l.try_gpu()
net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': lr})
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size,resize=28)
d2l.train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx,
              num_epochs)
training on gpu(0)
epoch 1, loss 1037.8718, train acc 0.324, test acc 0.360, time 20.8 sec
epoch 2, loss 9.4007, train acc 0.467, test acc 0.450, time 20.9 sec
epoch 3, loss 5.2146, train acc 0.504, test acc 0.433, time 21.0 sec
epoch 4, loss 2.8268, train acc 0.527, test acc 0.547, time 20.9 sec
epoch 5, loss 1.8387, train acc 0.579, test acc 0.474, time 21.0 sec

#67
def forward(self, X):
    Y = nd.relu(self.bn1(self.conv1(X)))
    Y = self.bn2(self.conv2(Y))
    if self.conv3:
        X = self.conv3(X)
    return nd.relu(Y + X)

想问一下,为什么bn层不加在conv3的后面呢?这里是有什么讲究吗?


#68

您好,请问您是怎么解决的?我是把原始的残差网络的输出层去掉,自己加了一个全连接层,然后报了和你一样的错误。可以分享一下代码吗