Search code examples
pythondeep-learningpytorchreinforcement-learningautoencoder

Is it possible to avoid encoding padding when creating a sequence data encoder in PyTorch?


I am attempting to make an observation history encoder, where my goal is for a model that takes in as input a variable length sequence of dimension [Time, Batch, Features] (where sequences are padded to fit a fixed Time length) and outputs a variable of dimension [Batch, New_Features]. My concern is that when I’m doing dimensionality reduction with FC layers, they will take the padded data into consideration. Is there any way to avoid this? Or is this something I don't need to worry about because the padding will naturally become part of the unique encodings?


Solution

  • The easiest way to do this is to mask padding elements when you pool representations.

    For example, say you have an input (with padding) of shape (bs, sl, n) and a binary padding mask of shape (bs, sl) that has 1 for non-padding items and 0 for padding items. You could do something like this:

    x = ... # (bs, sl, n)
    padding_mask = ... # (bs, sl)
    
    embeddings = model(x) # output of size (bs, sl, n)
    
    mean_embeddings = ((embeddings * padding_mask.unsqueeze(-1)).sum(1)) / torch.clamp(padding_mask.sum(-1).unsqueeze(-1), min=1e-9)
    

    When we compute mean_embeddings, we sum up all the non-padding elements in embeddings, then divide by the total number of non-padding elements in each sequence. This create a mean-pooled output of size (bs, n) without including padding elements in the pooling calculation.