Search code examples

Error with dimensionality when fitting a stateful RNN

I am fitting a stateful RNN with embedding layer to perform binary classification. I am having some confusion with the batch_size and batch_shape needed in the function APIs.

xtrain_padded.shape = (9600, 1403); xtest_padded.shape = (2400, 1403); ytest.shape = (2400,)
input_dim = size of tokenizer word dictionary
output_dim = 100 from GloVe_100d embeddings
number of SimpleRNN layer units = 200

h0: initial hidden states sampled from random uniform. 
h0 object has the same shape as RNN layer hidden states obtained when return_state = True.

The model structure:

batch_size = 2400  # highest common factor of xtrain and xtest
inp= Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out= Embedding(input_dim, output_dim, input_length= input_length, 
                         weights= [Emat], trainable= False, name= 'embedding')(inp)

rnn= SimpleRNN(200, return_sequences= True, return_state= True, stateful= True,
              batch_size= (batch_size, input_length, 100), name= 'simpleRNN')

h_0 = tf.random.uniform((batch_size, input_length, 200))
rnn_out, rnn_state = rnn(emb_out, initial_state=h0)
mod_out= Dense(1, activation= 'sigmoid')(rnn_out)
# Extract the y_t's and h_t's:
model = Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])

Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(2400, 1403)]            0         
embedding (Embedding)        (2400, 1403, 100)         4348900   
simpleRNN (SimpleRNN)        [(2400, 1403, 200), (2400 60200     
dense_3 (Dense)              (2400, 1403, 1)           201       

No issue when I fit the test data to model using the model API:

mod_out_allsteps, rnn_ht= model(xte_pad)  # Same as the 2 items from model.predict(xte_pad) 
print(mod_out_allsteps.shape, rnn_ht.shape) 
>> (2400, 1403, 1) (2400, 1403, 200)

However it raised a ValueError regarding unequal dimensions when I use, yte, epochs =1, batch_size = batch_size, verbose = 1)
    ValueError: Dimensions must be equal, but are 2400 and 1403 for '{{node binary_crossentropy_1/mul}} = Mul[T=DT_FLOAT](binary_crossentropy_1/Cast, binary_crossentropy_1/Log)' with input shapes: [2400,1], [2400,1403,200].

The error seems to suggest the model has confused the returned hidden states rnn_ht shaped [2400,1403,200] with something else when fitting the data. However I am going to need these states for computing the gradients on the initial hidden states i.e. enter image description here for t = 1,..., 1403.

I am confused with the dimensions in stateful RNNs:

  1. If stateful = True, are we constructing the model based on one mini-batch?
    i.e. the first index in Output Shape of each layer will be the batch_size?
  2. What is the batch_shape to be set in the first layer (Input)? Have I set it right?

Thank you in advance for helping with the error and my confusion!


batch_size = 2400  # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out=  tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)

rnn=  tf.keras.layers.SimpleRNN(200, return_sequences= True, return_state= False, stateful= True,
              batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
rnn_ht= rnn(emb_out)  # hidden states at all steps 
(2400, 1403, 200)

mod_out= Dense(1, activation= 'sigmoid')(Flatten()(rnn_ht))
# Extract the y_t's and h_t's:
model =  tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_ht])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])

Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(2400, 1403)]            0         
embedding (Embedding)        (2400, 1403, 100)         50000     
simpleRNN (SimpleRNN)        (2400, 1403, 200)         60200     
flatten_4 (Flatten)          (2400, 280600)            0         
dense_4 (Dense)              (2400, 1)                 280601    

mod_out_allsteps, rnn_ht= model_ht(xte_pad)   
print(mod_out_allsteps.shape, rnn_ht.shape)  
(2400, 1) (2400, 1403, 200)

But the error with `````` persists.


  • Look at the last layer in your model summary. Since you set the parameter return_sequences to True in the RNN layer, you are getting a sequence with the same number of time steps as your input and an output space of 200 for each timestep, hence the shape (2400, 1403, 200), where 2400 is the batch size. If you set this parameter to False, everything should work, since your labels have the shape (2400, 1).

    Working example:

    import tensorflow as tf
    batch_size = 2400  # highest common factor of xtrain and xtest
    input_length = 1403
    output_dim = 100
    inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input') 
    emb_out=  tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)
    rnn=  tf.keras.layers.SimpleRNN(200, return_sequences= False, return_state= True, stateful= True,
                  batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
    rnn_out, rnn_state = rnn(emb_out)
    mod_out=  tf.keras.layers.Dense(1, activation= 'sigmoid')(rnn_out)
    # Extract the y_t's and h_t's:
    model =  tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_out])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])

    where the first output is your binary decision.

    Update 1: with Flatten:

    import tensorflow as tf
    batch_size = 2400  # highest common factor of xtrain and xtest
    input_length = 1403
    output_dim = 100
    inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input') 
    emb_out=  tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)
    rnn=  tf.keras.layers.SimpleRNN(200, return_sequences= True, return_state= True, stateful= True,
                  batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
    rnn_out, rnn_state = rnn(emb_out)
    mod_out=  tf.keras.layers.Dense(1, activation= 'sigmoid')(tf.keras.layers.Flatten()(rnn_out))
    # Extract the y_t's and h_t's:
    model =  tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_out])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])