This question follows Does slice or index of chainer.Variable to get item in chainer has backward ability? Consider an typical example question : suppose I have convolutional layer + FC layer, my last FC layer output an vector.
Because in some cases I must slice the vector to calculate loss function, For example, In multi-label classification, ground truth label vector most elements is 0, only few of them is 1, In this situation, directly use F.sigmoid_cross_entropy may cause label imbalance problem, So I decide to use a[0, 1]( a is chainer.Variable output by last FC layer) to slice specific elements to calculate loss function.
In this situation, How does the last FC layer to gradient flow(BP), how does it to update its weight matrix??
When you write b = a[index]
for Variable a
and slices index
(might be fancy indexing), backpropagating through this operation sets values of b.grad
to a.grad[index]
, leaving other elements of a.grad
zero (because the corresponding elements of a
do not affect the loss value). The backprop of the last FC layer then computes the gradients w.r.t. the weight matrix and bias vector as usual with this a.grad
.