求助,symbol.Group如何只用部分输出做metric

以下的讨论是基于:
MXNet版本:1.2.0
操作系统: centos 7.4

如题
我想用symbol.Group实现multi outputs
但是希望评估只用到其中的一个output

目前用metric.CustomMetri() 写了自定义评估器
当output是symbol.Group时,metric会报ValueError

labels, preds = check_label_shapes(labels, preds, True)
"predictions {}".format(label_shape, pred_shape))
ValueError: Shape of labels 1 does not match shape of predictions 2

发现python源码metric.py默认会check labels和preds的形状
由于preds是multi outputs,但是label是其中一个output,所以会报错

目前想到的解决方法是,给labels也加入空白的label
shape和multi outputs匹配, 但是customMetric只用到一个output和一个label
求问各位大佬,有没有更好的解决方法 :joy:

你可以直接继承metric.Metric,就没有check_label_shapes,然后自己写update

懂了懂了 谢谢大佬 ~:laughing:


这个问题你是怎么解决的呢

如果自己显式定义循环来train和validaition,这个就很容易了。直接用mod.get_outputs()来抓取自己需要的那一个output来计算metric即可。
函数模板实现下面的就可以了:


classMyMetric():
    def __init__(self, params):
    def update(self, label, predict, is_Train):
    def reset(self):
    def show(self):
1赞

@LiuYang
謝謝你,那在驗證metric是怎麼驗證的,直接training中debug還是有test function可以測試metric,因為我的 metric有點複雜

你的train和validation的metric不一样吗。。不一样的话你可以实现上面代码的两个版本,train用一个,valid用另一个。运行逻辑是这样的。


# pdb.set_trace()
for epoch in range(params['start_epoch']+1, params['total_epoch']+1):
    print 'epoch: %.3d' % epoch
    train_iter.reset()
    train_metric.reset()
    iter_count = -1
    print('Training')
    for batch in train_iter:
        # pdb.set_trace()
        iter_count += 1
        mod.forward(batch, is_train=True)
        train_metric.update(batch.label, mod.get_outputs(), is_Train=True)
        mod.backward()
        mod.update()
        # if iter_count % 100 == 0:
        #      train_metric.show()
    train_metric.show()

    # mod.save_checkpoint(params['save_prefix'], epoch)sc
    print('Testing')
    valid_iter.reset()
    test_metric.reset()
    iter_count = -1
    for batch in valid_iter:
        iter_count += 1
        mod.forward(batch, is_train=False)
        test_metric.update(batch.label, mod.get_outputs(), is_Train=False)
        # if iter_count % 100 == 0:
        #     metric.show()
        # pdb.set_trace()
    test_metric.show()

@LiuYang
It is very helpful !! thanks. I will try it .

thanks, i followed your method. but reported this error:

Traceback (most recent call last):
  File "train_0723.py", line 434, in <module>
    main()
  File "train_0723.py", line 430, in main
    train_net(args)
  File "train_0723.py", line 424, in train_net
    epoch_end_callback=epoch_cb)
  File "/home/user1/recognition/parall_module_local_v1_gluon_group.py", line 546, in fit
    self.forward_backward(data_batch, eval_metric)
  File "/home/user1/recognition/parall_module_local_v1_gluon_group.py", line 416, in forward_backward
    eval_metric.update(data_batch.label[0], preds[0], )
  File "/usr/local/lib/python2.7/dist-packages/mxnet/metric.py", line 318, in update
    metric.update(labels, preds)
  File "/usr/local/lib/python2.7/dist-packages/mxnet/metric.py", line 1136, in update
    prob = pred[numpy.arange(label.shape[0]), numpy.int64(label)]
IndexError: index 163316 is out of bounds for axis 1 with size 512

i dont know the requirement of the labels and preds, and how to make it correct, could you help me?