machine-learning keras lstm recurrent-neural-network

Input Data Format for RNN

I am confused how exactly to encode a sequence of data as an input to an LSTM RNN.

In a vanilla DNN, there is an input for every label. What is the "input" in an RNN? Doesnt it have to be a set (or sequence) of data, in order to train sequential events associated with a label?

Im confused how to encode sequential information, because it seems that there should be more than a single input associated with a given label.

Solution

Let's draw up an example in code.

Say we have some sentences where each word in the sentence is encoded as a vector (vectors from word2vec maybe).

Suppose we want to classify each sentence into one of two class (0, 1). We might build a simple classifier like so:

import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense

# each example (of which we have a 100) is a sequence of 10 words and
# each words is encoded as 16 element vectors

X = np.random.rand(100, 10, 16) 
y = np.random.choice(1, 100)

model = Sequential()
model.add(LSTM(128, input_shape=(10, 16)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='sgd')

# fit model
model.fit(X, y, epochs=3, batch=16)