keras lstm word-embedding sequence-to-sequence

Merging sequence embedding with Time Series Features

I am having trouble around certain aspects of the Keras implementation of LSTM. This is a description of my problem:

I am trying to train a model for word correctness prediction. My model has two types of inputs:

A word sequence (sentence)
And a sequence of features vector (for each word I compute a features victor of 6).

e.g.

input_1 = ['we', 'have', 'two', 'review'] 
input_2 = [
           [1.25, 0.01, 0.000787, 5.235, 0.0, 0.002091], 
           [ 0.0787, 0.02342, 5.4595, 0.002091, 0.003477, 0.0], 
           [0.371533, 0.529893, 0.371533, 0.6, 0.0194156, 0.003297],
           [0.471533, 0.635, 0.458, 0.7, 0.0194156, 0.0287]
          ] 

 gives output = [1, 1, 2, 1]

As each sentence in my training set has different length, I should zero-pad all of my sentences such that they all have the same length.

My question is how about the second input, should I do padding! and how? as they are vectors.

Model Architecture :

input1 = Input(shape=(seq_length,), dtype='int32')
emb = Embedding(input_dim=num_words, output_dim = num_dimension, 
input_length=seq_length, weights=[embeddings], mask_zero=True,trainable=False)(input_layer)

input2 = Input(shape=(seq_length,6 ))
x = keras.layers.concatenate([emb, input2],axis=2)

lstm = LSTM(64, return_sequences=True)(x)
ackwards = LSTM(128, return_sequences=True, go_backwards=True)(x)

common = merge([forwards, backwards], mode='concat', concat_axis=-1)
out = TimeDistributed(Dense(no_targets, activation='softmax'))(lstm)

Solution

You are on the right track and yes you would need to pad your second input with zero rows to match the sentence lengths. Essentially it would look like this:

# Input 1
X1 = [[12, 34, 3], [6, 7, 0]] # where numbers are word indices and 0 is padding
# Input 2
X2 = [[[1.23,...,2.4], [1.24, ...], [0.6, ...]], [[3.25, ...], [2.4, ...], [0,0,0,0,0]]]
# So the padded words get zero feature vectors as well and the shapes match

But fear not, because you concatenate emb with input2 the mask_zero=True also gets propagated to the concatenated vector so the LSTM actually ignores the padding from second input as well.