Keras has a preprocessing util to pad sequences, but it assumes that the sequences are integer numbers.
My sequences are vectors (my own embeddings, I do not want to use Keras embeddings), is there any way in which I can pad them to use in a LSTM?
Sequences can be made equal in Python, but the padding methods in Keras provide additional metainformation for layers like LSTM to consider for masking.
this is a possibility to pad an array of float of different length with zeros
to mask the zeros you can use the masking layer (otherwise remove it)
I initialize your embeddings in a list because numpy can't handle array of different lenght. in the example, I use 4 samples of different lengths. the relative embeddings are stored in this list list([1,300],[2,300],[3,300],[4,300])
# recreate your embed
emb = []
for i in range(1,5):
emb.append(np.random.uniform(0,1, (i,300)))
# custom padding function
def pad(x, max_len):
new_x = np.zeros((max_len,x.shape[-1]))
new_x[:len(x),:] = x # post padding
return new_x
# pad own embeddings
emb = np.stack(list(map(lambda x: pad(x, max_len=100), emb)))
emb_model = tf.keras.Sequential()
emb_model.add(tf.keras.layers.Masking(mask_value=0., input_shape=(100, 300)))
emb_model.add(tf.keras.layers.LSTM(32))
emb_model(emb)