python pytorch batch-normalization torchtext

Using BatchNorm1d layer with Embedding and Linear layers for NLP text-classification problem throws RuntimeError

I am trying to create a neural network and train my own Embeddings. The network has the following structure (PyTorch):

import torch.nn as nn

class MultiClassClassifer(nn.Module):
  #define all the layers used in model
  def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
    
    #Constructor
    super(MultiClassClassifer, self).__init__()

    #embedding layer
    self.embedding = nn.Embedding(vocab_size, embedding_dim)

    #dense layer
    self.hiddenLayer = nn.Linear(embedding_dim, hidden_dim)

    #Batch normalization layer
    self.batchnorm = nn.BatchNorm1d(hidden_dim)

    #output layer
    self.output = nn.Linear(hidden_dim, output_dim)

    #activation layer
    self.act = nn.Softmax(dim=1) #2d-tensor

    #initialize weights of embedding layer
    self.init_weights()

  def init_weights(self):

    initrange = 1.0
    
    self.embedding.weight.data.uniform_(-initrange, initrange)
  
  def forward(self, text):

    embedded = self.embedding(text)

    hidden_1 = self.batchnorm(self.hiddenLayer(embedded))

    return self.act(self.output(hidden_1))

My training_iterator object looks like:

 batch = next(iter(train_iterator))
 batch.text_normalized_tweet[0]

tensor([[ 240,  538,  305,   73,    9,  780, 2038,   13,   48,    1,    1,    1,
            1,    1,    1,    1,    1,    1,    1,    1,    1],
        [ 853,   57,    2,   70, 1875,  176,  466,    1,    1,    1,    1,    1,
            1,    1,    1,    1,    1,    1,    1,    1,    1],
        ...])

with shape: torch.Size([32, 25]). 32= batch_size I used to create the training iterator with data.BucketIterator and 25 = the sequences in the batch.

When I create a model instance:

INPUT_DIM = len(TEXT.vocab) #~5,000 tokens
EMBEDDING_DIM = 100
HIDDEN_DIM = 64
OUTPUT_DIM = 3 #target has 3 classes

model = MultiClassClassifer(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)

and execute

model(batch.text_normalized_tweet[0]).squeeze(1)

I get back the following RuntimeError

RuntimeError: running_mean should contain 15 elements not 64

You may also find my Golab Notebook here.

Solution

I found a workaround based on the example given by @jhso (above).

INPUT_DIM = len(TEXT.vocab) #~5,000 tokens
EMBEDDING_DIM = 100
HIDDEN_DIM = 64

e = nn.Embedding(INPUT_DIM, EMBEDDING_DIM)
l = nn.Linear(EMBEDDING_DIM, HIDDEN_DIM)
b = nn.BatchNorm1d(HIDDEN_DIM)
soft = nn.Softmax(dim=1)
out = nn.Linear(HIDDEN_DIM, 3)

text, text_lengths = batch.text_normalized_tweet
y = e(text)
tensor, batch_size = nn.utils.rnn.pack_padded_sequence(y,text_lengths, batch_first=True)[0], nn.utils.rnn.pack_padded_sequence(y,text_lengths, batch_first=True)[1] #added rnn.pack_padded_sequence
y = b(l(tensor))

Added pack_padded_sequence() method from utils.rnn package which will take the embeddings as input. I also had to calculate both the text and the text_lengths since the way I created the training_iteror it returns 2 outputs (text, text_lenght).