python deep-learning pytorch reinforcement-learning autoencoder

Is it possible to avoid encoding padding when creating a sequence data encoder in PyTorch?

I am attempting to make an observation history encoder, where my goal is for a model that takes in as input a variable length sequence of dimension [Time, Batch, Features] (where sequences are padded to fit a fixed Time length) and outputs a variable of dimension [Batch, New_Features]. My concern is that when I’m doing dimensionality reduction with FC layers, they will take the padded data into consideration. Is there any way to avoid this? Or is this something I don't need to worry about because the padding will naturally become part of the unique encodings?

Solution

The easiest way to do this is to mask padding elements when you pool representations.

For example, say you have an input (with padding) of shape (bs, sl, n) and a binary padding mask of shape (bs, sl) that has 1 for non-padding items and 0 for padding items. You could do something like this:

x = ... # (bs, sl, n)
padding_mask = ... # (bs, sl)

embeddings = model(x) # output of size (bs, sl, n)

mean_embeddings = ((embeddings * padding_mask.unsqueeze(-1)).sum(1)) / torch.clamp(padding_mask.sum(-1).unsqueeze(-1), min=1e-9)

When we compute mean_embeddings, we sum up all the non-padding elements in embeddings, then divide by the total number of non-padding elements in each sequence. This create a mean-pooled output of size (bs, n) without including padding elements in the pooling calculation.