I was watching the online lecture of CS 231n from Stanford. I have a question, maybe I'm getting confused for some reason. The link is: the video
Go to 35:46 and in the backward function, the formula for dx is:
dx = self.y * dz.
That I don't get since
z = x*y.
So
dx = dz/y
Can someone please explain me why the difference is happening?
This is just a weird notation in his code (dz,dx,dy are not used in their usual sense). the variable dz here denotes the derivative of the cost function L (of the complete neural network) with respect to z, while the derivatives of L with respect to x and y are noted dx and dy.The derivative of z with respect to x, which is y, is simply given by self.y. With these notations in mind, the rest follows from the chain rule.