I am creating a Neural Network from scratch for MNIST data, so I have 10 classes in the output layer. I need to perform backpropagation and for that, I need to calculate dA*dZ
for the last layer where dA
is the derivative of the loss function L
wrt the softmax activation function A
and dZ
is the derivative of the softmax activation functionA
wrt to z
where z=wx+b
. The size obtained for dA
is 10*1
whereas the size obtained for dZ
is 10*10
.
Is it correct? If yes, who do I multiply dA*dZ
as they have different dimension.
You are almost there. However, you need to transpose dA
, e.g. with numpy.transpose(dA)
.
Then you will have the right dimensions of dA
and dZ
to perform matrix multiplication.