Search code examples
neural-networkkeraslstmrecurrent-neural-networksequential

How do LSTM units incorporated spatial or sequential information?


As I understand it, LSTM units are linked in sequence and each unit has an output, and each LSTM unit passes an output to the next LSTM unit in the chain.

However, don't you put your entire input in every LSTM unit? I don't see how this chain reflects the structure of the data in which the sequential order matters.

Can someone explain where I go wrong? I'm particularly interested in the version of lstm implemented in keras but every answer is very welcome!


Solution

  • No, LSTM units are all parallel.

    The sequence exists only in the data itself, when you separate a dimension to be what they call the time steps. Data passed to an LSTM is shaped as (Batch Size, Time Steps, Data Size).

    The sequence occurs in "time steps", but all units work in parallel.

    Even an LSTM with only one unit will still work in a sequence with the time steps.

    What happens to LSTMs is that they've got a "state". It's an internal matrix that is like it's memory. In each sequence step, there are "gates" (other matrices) that decide, based on the step input, if that step will change the state and how much. There are also "forget gates", that decide if the old state will kept or forgotten.


    In keras, you can have the attribute return_sequences set to true or false.

    If true, the result will carry the results of each time step.
    If false, only the final result will be output.

    In both cases, units will be just a "size" of the result. (Pretty much as the units in a Dense layer, or the filters in a Convolutional layer, they're more power, more features, but not more steps).

    The output with return_sequences=False will have only the units as size: (Batch Size, Units)
    The output with return_sequences=True will keep the time steps: (Batch Size, Time Steps, Units)