平稳的训练损失和低训练精度

为何我的 loss_function() 运算逻辑 (尤其是 policy_output_discretevalue_output_discrete ) 造成平稳的训练损失和低训练精度 ?

# Forward Pass
policy_output, value_output = net(_board_features_and_turn)

# Since both policy_output and value_output are of continuous probability nature,
# we need to change them to discrete number for loss_function() computation
policy_output_discrete = torch.zeros(len(_score), NUM_OF_POSSIBLE_MOVES, requires_grad=True)
if USE_CUDA:
    policy_output_discrete = policy_output_discrete.cuda()

for topk_index in range(len(_score)):  # functionally equivalent to softmax()
    policy_output_discrete[topk_index][policy_output.topk(1).indices[topk_index]] = 1

# substract 1 because score is one of these [-1, 0, 1] values
value_output_discrete = torch.topk(value_output, 1).indices - 1

# Loss at each iteration by comparing to target(moves)
loss1 = loss_function(policy_output_discrete, move)
# Loss at each iteration by comparing to target(score)
loss2 = loss_function(value_output_discrete, _score)

loss = loss1 + loss2

# Backpropagating gradient of loss
optimizer.zero_grad()
loss.backward()

我解决了平稳训练损失的问题。

move 的训练和测试精度都是 95% 以上 , 但是 score 的训练和测试精度都是 30%以下

这是为何呢 ?