残差网络(ResNet) 讨论区


#42

更新下版本?应该改小了内存使用。虽然2g是小了点,不过可以设置小batch size


#43

你在block 6的开头应该加一层BN和ACTIVATION
否则就相当于数据从前面的卷积层出来没加激活

不明白,教程里的res18也没加啊,怎么没事?


#44

11
ResNet_V2的版本每个residual block出来的feature map是未被激活,所以在最后一个residual block后面需要加bn和relu


#45

请问下论坛里的大佬,shape=(None,100, 300)的数据怎么样在Conv1d(channels=256, kernel_size=2, layout=‘NCW’)里输出维度(None,100, 256)?因为核的width为2,padding不管怎么补都补不到100,layout也不支持’NWC’,我现在是把输入转置了一下,转成(None, 300,100)输入的。急啊,复现了论文回来写篇博客,跪谢!!


#46

尝试实现resnet v2,遇到这种问题.好像loss.backward没有效果一样.怎么debug?

UserWarning: Gradient of Parameter resnet181_conv1_weight on context gpu(1) has not been updated by backward since last step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient


#47

好像找到问题了,residual unit里面应该用out,用成x了.


#48

请问有没有方便的办法可视化loss的计算图?方便debug


#49

运行之后报NotImplementedError这个错误 怎么回事? 代码照着敲的没有问题


#50

试了下将Residual中的Conv->BN->Relu改成BN->Relu->Conv:

    def forward(self, x):
        #out = nd.relu(self.bn1(self.conv1(x)))
        #out = self.bn2(self.conv2(out))
        #if not self.same_shape:
        #    x = self.conv3(x)
        #return nd.relu(out + x)
        
        out = self.conv1(nd.relu(self.bn1(x)))
        out = self.conv2(nd.relu(self.bn2(out)))
        if not self.same_shape:
            x = self.conv3(x)
        return out + x

对照了论文图一,应该就是这么个改法。然而,结果变得很差:

  • Original

Epoch 0. Loss: 0.438, Train acc 0.85, Test acc 0.89, Time 417.1 sec

  • After modified

Epoch 0. Loss: nan, Train acc 0.10, Test acc 0.10, Time 406.3 sec

有没有大佬可以指点一下?:smiley:


#51

是不是要改成這樣,因為你的x 是conv後 你在一次conv感覺怪怪的

 x = self.conv3(nd.relu(self.bn3(x)))

#52

我觉得不是,x = self.conv3(x)是Residual中较为简单的那条通路。


#53

尝试实现用nn.HybridBlook类实现resnet50报错 求解


#54

感觉改版后没有之前的直观易懂了,如果想看一下改版之前的内容去哪看呢


#55

怎样保存resnet中间层的feature map呢


#56

learning rate 调小一些,如 lr=0.001


#57
class Residual(nn.Block):
  def __init__(self, num_channels, use_1x1conv=False, strides=1, **kwargs):
    super(Residual, self).__init__(**kwargs)

    self.conv1 = nn.Conv2D(num_channels, kernel_size=3, padding=1, strides=strides)
    self.conv2 = nn.Conv2D(num_channels, kernel_size=3, padding=1)

    if use_1x1conv:
      self.conv3 = nn.Conv2D(num_channels, kernel_size=1, strides=strides)
      self.bn3 = nn.BatchNorm()
    else:
      self.conv3 = None
    self.bn1 = nn.BatchNorm()
    self.bn2 = nn.BatchNorm()

  def forward(self, X):
    if self.conv3:
      X = nd.relu(self.bn3(X))
    Y = self.conv1(nd.relu(self.bn1(X)))
    Y = self.conv2(nd.relu(self.bn2(Y)))
    if self.conv3:
      X = self.conv3(X)
    return Y + X

再试试


#58

thumbnail为True的话,ResNet第一层的卷积核用小的3X3取代原来7X7的卷积核,效果如何?


#59

你好,按照你的代码我训练,并且调下lr=1e-3之后,效果并没有之前resnet V1好,请问一下在你这效果要比v1好吗?


#60

请问一下,在mxnet的resnet v2当中,并没有看到batchNorm->activate->conv的结构,可否解释一下resnetV2的网络结构?

class ResNetV2(HybridBlock):
r"""ResNet V2 model from
"Identity Mappings in Deep Residual Networks" <https://arxiv.org/abs/1603.05027>_ paper.

Parameters
----------
block : HybridBlock
    Class for the residual block. Options are BasicBlockV1, BottleneckV1.
layers : list of int
    Numbers of layers in each block
channels : list of int
    Numbers of channels in each block. Length should be one larger than layers list.
classes : int, default 1000
    Number of classification classes.
thumbnail : bool, default False
    Enable thumbnail.
"""
def __init__(self, block, layers, channels, classes=1000, thumbnail=False, **kwargs):
    super(ResNetV2, self).__init__(**kwargs)
    assert len(layers) == len(channels) - 1
    with self.name_scope():
        self.features = nn.HybridSequential(prefix='')
        self.features.add(nn.BatchNorm(scale=False, center=False))
        if thumbnail:
            self.features.add(_conv3x3(channels[0], 1, 0))
        else:
            self.features.add(nn.Conv2D(channels[0], 7, 2, 3, use_bias=False))
            self.features.add(nn.BatchNorm())
            self.features.add(nn.Activation('relu'))
            self.features.add(nn.MaxPool2D(3, 2, 1))

        in_channels = channels[0]
        for i, num_layer in enumerate(layers):
            stride = 1 if i == 0 else 2
            self.features.add(self._make_layer(block, num_layer, channels[i+1],
                                               stride, i+1, in_channels=in_channels))
            in_channels = channels[i+1]
        self.features.add(nn.BatchNorm())
        self.features.add(nn.Activation('relu'))
        self.features.add(nn.GlobalAvgPool2D())
        self.features.add(nn.Flatten())

        self.output = nn.Dense(classes, in_units=in_channels)

def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0):
    layer = nn.HybridSequential(prefix='stage%d_'%stage_index)
    with layer.name_scope():
        layer.add(block(channels, stride, channels != in_channels, in_channels=in_channels,
                        prefix=''))
        for _ in range(layers-1):
            layer.add(block(channels, 1, False, in_channels=channels, prefix=''))
    return layer

def hybrid_forward(self, F, x):
    x = self.features(x)
    x = self.output(x)
    return x

这里引用一下这个repo的readme
这里面说,在152层以内训练cifar 10,resnet v1的效果要好于resnet v2


#61

有个问题请教下,resnet网络是咱们官网的结构 batch size 128结果是ok的,但是256就显示报错,电脑显卡是GTX980TI有6个g的显存,256应该是不会out of memory的。是在docker环境下运行resnet,不知道啥原因,网上搜了下也没啥结果?