从0开始的卷积神经网络 讨论区


#1

http://zh.d2l.ai/chapter_convolutional-neural-networks/conv-layer.html


#2

在这个例子中,label = label.as_in_context(ctx),label被显式转移到GPU,但是data没有。data是需要和GPU中的weight,bias运算的。

  1. 请问这是怎么回事?
  2. 每个量手动指定GPU/CPU感觉有点繁琐,有别的方式么

#3
  1. data的转移发生在net函数里 (X = X.as_in_context(W1.context))
  2. ndarray的变量通常可以用ctx=这个参数来更改设备,比as_in_context会简洁一些,比如data的定义可以写成
data = nd.arange(18, ctx=mx.gpu(0)).reshape((1,2,3,3))

#4

可以设default ctx,但我觉得那样不直观,所以notebook没用

with mx.Context(mx.gpu(0)):
   a = mx.nd.zeros((3,2))
   b = ..

#5

老师你好,我按这个方法创建的时候发现这个问题image
修改后才正确(with mx.Context(mx.gpu()):
cc=nd.random_normal(shape=(5,2),ctx=mx.gpu())
aa=nd.ones((3,4))
print(cc,aa)),这是什么情况啊?


#6

image


#7

这一章结论里说“可以看到卷积神经网络比前面的多层感知的分类精度更好。”但实际输出的训练结果,无论从0开始还是Gluon实现的卷积,比对应的多层感知机,分类精度不都变差了么?


于置顶 #8

#9

import mxnet as mx
import matplotlib.pyplot as plt

def transform(data, label):
return data.astype(‘float32’) / 255, label.astype(‘float32’)

mnist_train = mx.gluon.data.vision.FashionMNIST(train=True, transform=transform)
mnist_test = mx.gluon.data.vision.FashionMNIST(train=False, transform=transform)

def show_images(images):
n = len(images)
_, figs = plt.subplots(1, n, figsize=(15, 15))
for i in range(n):
figs[i].imshow(images[i].reshape((28,28)).asnumpy())
figs[i].axes.get_xaxis().set_visible(False)
figs[i].axes.get_yaxis().set_visible(False)
plt.show()

def get_text_labels(label):
text_labels = [‘t-shirt’, ‘trouser’, ‘pullover’, ‘dress,’, ‘coat’,
‘sandal’, ‘shirt’, ‘sneaker’, ‘bag’, ‘ankle boot’]

return  [text_labels[int(i)] for i in label]

batch_size = 256
train_data = mx.gluon.data.DataLoader(mnist_train, batch_size, True)
test_data = mx.gluon.data.DataLoader(mnist_test, batch_size, False)

try:
ctx = mx.gpu()
_ = mx.nd.zeros([1], ctx = ctx)
except:
ctx = mx.cpu()

weight_scale = 0.01
num_outputs = 10
num_fc = 128

W1 = mx.nd.random_normal(shape = [20, 1, 5, 5], scale = weight_scale, ctx = ctx)
b1 = mx.nd.zeros(W1.shape[0], ctx = ctx)

W2 = mx.nd.random_normal(shape = [50, 20, 3, 3], scale = weight_scale, ctx = ctx)
b2 = mx.nd.zeros(W2.shape[0], ctx = ctx)

W3 = mx.nd.random_normal(shape = [1250, 128], scale = weight_scale, ctx = ctx)
b3 = mx.nd.zeros(W3.shape[1], ctx = ctx)

W4 = mx.nd.random_normal(shape = [128, 10], scale = weight_scale, ctx = ctx)
b4 = mx.nd.zeros(W4.shape[1], ctx = ctx)

params = [W1, b1, W2, b2, W3, b3, W4, b4]

for param in params:
param.attach_grad()

def net(X):
X = X.as_in_context(W1.context)

h1_conv = mx.nd.Convolution(data=X, weight=W1, bias=b1, kernel=[5,5], num_filter=20)
h1_acti = mx.nd.relu(h1_conv)
h1_pool = mx.nd.Pooling(data = h1_acti, pool_type = "avg", kernel = [2,2], stride = [2,2])

h2_conv = mx.nd.Convolution(data = h1_pool, weight = W2, bias = b2, kernel = [3,3], num_filter = 50)
h2_acti = mx.nd.relu(h2_conv)
h2_pool = mx.nd.Pooling(data=h2_acti, pool_type="max", kernel=[2, 2], stride=[2, 2])

h2 = mx.nd.Flatten(data = h2_pool)

h3 = mx.nd.relu(mx.nd.dot(h2, W3) + b3)
h4 = mx.nd.relu(mx.nd.dot(h3, W4) + b4)

print('1st conv block:', h1_pool.shape)
print('2nd conv block:', h2.shape)
print('1st dense:', h3.shape)
print('2nd dense:', h4.shape)
print('output:', h4)

return h4

softmax_cross_entropy = mx.gluon.loss.SoftmaxCrossEntropyLoss()

learning_rate = 0.1

def SGD(params, lr):
for param in params:
param[:] = param - lr * param.grad

def accuracy(output, true_label):
return mx.nd.mean(output.argmax(axis=1) == true_label).asscalar()

def evec_accuracy(data_iter, net):
acc = 0.0
for data_temp, lable_temp in data_iter:
output = net(data_temp)
acc += accuracy(output, lable_temp)
return acc / len(data_iter)

for epoch in range(5):
train_loss = 0.0
train_acc = 0.0

for x ,y in train_data:
    y = y.as_in_context(ctx)
    with mx.gluon.autograd.record():
        out = net(x)
        loss = softmax_cross_entropy(out, y)
    loss.backward()
    SGD(params, learning_rate/batch_size)

    train_loss += mx.nd.mean(loss).asscalar()
    train_acc = accuracy(out, y)

test_acc = evec_accuracy(test_data, net, ctx)

print("Epoch %d. Loss: %f, Train acc %f, Test acc %f" % (
    epoch, train_loss / len(train_data), train_acc / len(train_data), test_acc))

data_x, label_y = mnist_test[0:10]
show_images(data_x)
print(‘true labels’)
print(get_text_labels(label_y))

predicted_labels = net(data_x).argmax(axis=1)
print(‘predicted labels’)
print(get_text_labels(predicted_labels.asnumpy()))

错误信息如下:
Traceback (most recent call last):
File “C:\Program Files\JetBrains\PyCharm 2017.1.1\helpers\pydev\pydevd.py”, line 1585, in
globals = debugger.run(setup[‘file’], None, None, is_module)
File “C:\Program Files\JetBrains\PyCharm 2017.1.1\helpers\pydev\pydevd.py”, line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File “C:/Users/maomaochong/PycharmProjects/StudySpeak/cnn.py”, line 105, in
out = net(x)
File “C:/Users/maomaochong/PycharmProjects/StudySpeak/cnn.py”, line 59, in net
h1_conv = mx.nd.Convolution(data=X, weight=W1, bias=b1, kernel=[5,5], num_filter=20)
File “”, line 51, in Convolution
File “C:\Users\maomaochong\Anaconda2\lib\site-packages\mxnet_ctypes\ndarray.py”, line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
File “C:\Users\maomaochong\Anaconda2\lib\site-packages\mxnet\base.py”, line 143, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Shape inconsistent, Provided=(20,1,5,5), inferred shape=(20,28,5,5)

help!!!


关于ndarray.Convolution和ndarray.Pooling的channels_{first,last}问题
#11

这个是个bug,random_normal目前不去检查设置的默认context,应该会马上fix

默认参数没调好 :sweat_smile:

感觉是输入的netx的shape有问题,应该是(batch_size, channel, height, weight), 这里channel应该=1,但输入可能=28了。


#13

我的问题解决了,输入的数据需要reshape 成 [-1, 1, 28, 28]的维度


#14

2017-09-16 12-11-22屏幕截图

权重格式那里是不是写错了?应该为output_filter×input_filter


#15

我觉得好像也是写错了,老师看下


#16

应该是写错了,官方文档

For general 2-D convolution, the shapes are

data: (batch_size, channel, height, width)
weight: (num_filter, channel, kernel[0], kernel[1])
bias: (num_filter,)
out: (batch_size, num_filter, out_height, out_width).


#17

今天还仔细推了下:

对于conv_filter的权重w,官方有以下几个解释:

  1. 原文注释为:权重格式是input_filter x output_filter x height x width

  2. api文档中:weight: (num_filter, channel, kernel[0], kernel[1])

这里的height,width与kernel[0], kernel[1]对应,没有什么好讨论的。

将这一节的两层conv_layer分别打印参数:

4

可发现,无论是W1还是W2,其shape[0]总是代表着该卷积层中num_kernel或者说是num_filter

卷积后,每一个卷积核把input_data(第一层为data,第二层为pooled_h1)对应生成一个output_data,所以output_data(第一层为relued_h1,第二层为relued_h2)的shape[1]总是对应于W的shape[0](例如,W1.shape[0] == relued_h1.shape[1]);

同时,还可以发现relued_h1.shape[1] == W2.shape[1]

再看看input_data的格式:batch x channel x height x width,bingo。

这样看来:
W的格式最好的理解是上述第2种weight: (num_filter, channel, kernel[0], kernel[1])

至于权重格式是input_filter x output_filter x height x width我个人觉得理解起来的确是错的

W1.shape[0] == relued_h1.shape[1] 说明了权重第一项是output_filter

relued_h1.shape[1] == W2.shape[1] 说明了权重第二项是input_filter


note:探究上述等式成立的原因(便于记忆)


网络中的网络(NiN) 讨论区
#18

赞,很欣赏你的钻研精神。其实卷积层的W和全连接层的W我很容易混
2017-09-26 21-33-38屏幕截图

W1.shape[0]=output channels
W1.shape[1]=input channels

全连接层中的weight W3.shape[0]=input dim
W3.shape[1]=output dim


#19

哈哈,总结得不错,遇到同道中人了。我也是在草稿纸上手写一遍,总结了和你一样的结论:卷积层和全连接层的W对应input还是output,在教材中,位置的确相反(不过也没啥,只是文档中设置如此,换个位置无伤大雅,计算时候注意矩阵乘法就行)。

重新回顾cnn-gluon后,其实实操中咱们也无需这般麻烦,具体在设置net时候,in_channels(对应于你说的input channels,这是我在api文档看到的官方给的名词)可忽略,只要设置Conv2D()中的channels参数设置即可(这里的channels对应于你说的output channels)

啰嗦了,其实说这么多,也是证明咱们的想法一致。将原文注释

#权重格式是input_filter x output_filter x height x width

改为

#output_channels x in_channels x height x width

最为妥当。


#20

握手握手,在社区找到小伙伴的感觉越来越棒了。


#21

有小伙伴提交个PR来改下吗?


#22

提交啦,第一次提交PR,有错的话还需沐神大大包涵。

:joy:当个contributor的感觉还挺好。