Padding sequences of 2D elements in keras

I have a set of samples, each being a sequence of a set of attributes (for example a sample can comprise of 10 sequences each having 5 attributes). The number of attributes is always fixed, but the number of sequences (which are timestamps) can vary from sample to sample. I want to use this sample set for training an LSTM network in Keras for a classification problem and therefore I should pad the input size for all batch samples to be the same. But the pad_sequences processor in keras gets a fixed number of sequences with variable attributes and pad the missing attributes in each sequence, while I need to add more sequences of a fixed attribute length to each sample. So I think I can not use it and therefore I padded my samples separately and made a unified datset and then fed my network with it. But is there a shortcut with Keras functions to do this?

Also I heard about masking the padded input data during learning but I am not sure if I really need it as my classifier assigns one class label after processing the whole sample sequence. do I need it? And if yes, could you please help me with a simple example on how to do that?

Solution

Unfortunately, the documentation is quite missleading, but pad_sequences does exactly what you want. For example, this code

length3 = np.random.uniform(0, 1, size=(3,2))
length4 = np.random.uniform(0, 1, size=(4,2))
pad_sequences([length3, length4], dtype='float32', padding='post')

results in

[[[0.0385175  0.4333343 ]
  [0.332416   0.16542904]
  [0.69798684 0.45242336]
  [0.         0.        ]]

 [[0.6518417  0.87938637]
  [0.1491589  0.44784057]
  [0.27607143 0.02688376]
  [0.34607577 0.3605469 ]]]

So, here we have two sequences of different lengths, each timestep having two features, and the result is one numpy array where the shorter of the two sequences got padded with zeros.

Regarding your other question: Masking is a tricky topic, in my experience. But LSTMs should be fine with it. Just use a Masking() layer as your very first one. By default, it will make the LSTMs ignore all zeros, so in you case exactly the ones you added via padding. But you can use any value for masking, just as you can use any value for padding. If possible, choose a value that does not occur in your dataset.

If you don't use masking, that will yield the danger that your LSTM learns that the padded values do have some meaning while in reality they don't.

For example, if during training you feed in the the sequence

[[1,2],
 [2,1],
 [0,0],
 [0,0],
 [0,0]]

and later on the trained network you only feed in

[[1,2],
 [2,1]]

You could get unexpected results (not necessarily, though). Masking avoids that by excluding the masked value from training.