Search code examples
neural-networkrecurrent-neural-network

What is the difference between Feedforward Neural Networks (ANN) and Recurrent Neural Networks (RNN)


In ANN the equation during Forward Propagation is Y = W.X + b.

What is the equation during Forward Propagation for RNN, as it involves States and Timesteps.

What is the difference between ANN and RNN in terms of Back Propagation.

Also, what is the difference in functionality between Dropout in ANN vs Recurrent_Dropout in RNN.

Are there any other key differences between ANN and RNN.


Solution

  • The equation for Forward Propagation of RNN, considering Two Timesteps, in a simple form, is shown below:

    Output of the First Time Step: Y0 = (Wx * X0) + b)

    Output of the Second Time Step: Y1 = (Wx * X1) + Y0 * Wy + b where Y0 = (Wx * X0) + b)

    To elaborate it, consider RNN has 5 Neurons/Units, more detailed equation is mentioned in the screenshot below:

    Equation of Forward Propagation of RNN

    Back Propagation in RNN:

    • The Back Propagation in RNN is done through each and every Timestep. Hence it is called Backpropagation through Time (BPTT).
    • The output sequence is evaluated using a cost function C(y(t(min)), y(t(min+1)), ... y(t(max))) (where tmin and tmax are the first and last output time steps, not counting the ignored outputs), and the gradients of that cost function are propagated backward through the unrolled network
    • Finally the model parameters are updated using the gradients, computed during BPTT
    • Note that the gradients flow backward through all the outputs used by the cost function, not just through the final output

    In the screenshot below, Dashed Lines represents Forward Propagation and Solid Lines represents Back Propagation.

    Flow of Forward Propagation and Back Propagation in RNN

    Dropout: If we set the value of Dropout as 0.1 in a Recurrent Layer (LSTM), it means that it will pass only 90% of Inputs to the Recurrent Layer

    Recurrent Droput If we set the value of Recurrent Dropout as 0.2 in a Recurrent Layer (LSTM), it means that it will consider only 80% of the Time Steps for that Recurrent Layer

    Hope this answers all your queries!