Search code examples
backpropagation

CS231N Lecture 4 Back Prop - Chain Rule


I am sure this has a simple answer! I am asking to improve my understanding.

A diagram: a modification of: CS231N Back Propagation Back Propagation Through Time

If the Cain Rule is applied to get the Delta for Y, the Gradient will be: dy = -4 according to the Diagram.

Applying Chain Rule Notation: df/dy = df/dq * dq/dy

Numerically:

double x = -2;
double y = 5; 
double q = 3;
double z = -4;
double f = -12;

double df = 1;
double dz = 3;
double dq = -4;
double dy = df * dq; 
double dx = df * dq;

Where: df = df/df = 1 as shown above, and dq = df/dq = -4 as shown above. Thus: 1(df) * -4(dq) = -4(dy). Or have I got this completely wrong?

Where are the Numerical Values actually coming from, where in the diagram? Is this a Gradient Only Numerical chain or are we deriving from the other input values? The reason I ask here, is because on Page 48, there is a slightly confusing Code Example: enter image description here

I am looking at the (/) sign, df/dy, as a division, and I think this is wrong? df/dy = df/dq * dq/dy = 1/-4 * -4/-4 = 0.25 - What is the purpose of one number over the other here?

Is it that df/dy = dy are they the same things, symbolising dy of df, meaning one Gradient Flowing Back in Time?

Apologies, I am somewhat confused.


Solution

  • A refresher on Differential Equations helped clear up the confusion: https://www.khanacademy.org/math/differential-equations/first-order-differential-equations/differential-equations-intro/v/differential-equation-introduction

    Confusion is the greatest problem for learning!