Search code examples
pythonmachine-learningkeraslstm

Kernel and Recurrent Kernel in Keras LSTMs


I'm trying to draw in my mind the structure of the LSTMs and I don't understand what are the Kernel and Recurrent Kernel. According to this post in LSTMs section, the Kernel it's the four matrices that are multiplied by the inputs and Recurrent Kernel it's the four matrices that are multiplied by the hidden state, but, what are those 4 matrices in this diagram?

enter image description here

Are the gates?

I was testing with this app how the unit variable of the code below affect the kernel, recurrent kernel and bias:

model = Sequential()
model.add(LSTM(unit = 1, input_shape=(1, look_back)))

with look_back = 1 it returns me that:

enter image description here

with unit = 2 it returns me this

enter image description here

With unit = 3 this

enter image description here

Testing with this values I could deducted this expressions

enter image description here

but I don't know how this works by inside. What does mean <1x(4u)> or <ux(4u)>? u = units


Solution

  • The kernels are basically the weights handled by the LSTM cell

    units = neurons, like the classic multilayer perceptron

    It is not shown in your diagram, but the input is a vector X with 1 or more values, and each value is sent in a neuron with its own weight w (the which we are going to learn with backpropagation)

    The four matrices are these (expressed as Wf, Wi, Wc, Wo):

    The weights

    When you add a neuron, you are adding other 4 weights\kernel

    So for your input vector X you have four matrix. And therefore

    1 * 4 * units = kernel
    

    Regarding the recurrent_kernel here you can find the answer. Basically in keras input and hidden state are not concatenated like in the example diagrams (W[ht-1, t]) but they are split and handled with other four matrices called U:

    how keras handle input x and hidden state ht-1

    Because you have a hidden state x neuron, the weights U (all four U) are:

    units * (4 * units) = recurrent kernel
    

    ht-1 comes in a recurrent way from all your neurons. Like in a multilayer perceptron, each output of a neuron goes in all the next recurrent layer neurons

    source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/