Search code examples
pythonmachine-learningpytorchneural-network

How to train NN in batches with odd examples size?


I am a newbie in the NN field and I am doing some training with pytorch.
I decided to make a simple vanilla NN.
I used a personal dataset i had with 2377 numerical features and 6277 examples.

My first try was to make the NN predict each single example, so the pseudocode would look like

for i in range(...):
    X = ... # features
    y = ... # outcome
    y_pred = model(X[i])
    loss = criterion(y_pred, y)

    y_pred.size # [1,1]
    y.size # [1,1]

This took about 10 seconds per epoch and i decided to improve it using mini batches.

So i define the batch size at the beginning and the NN in Pytorch is defined like this

batch_size = 30
n_inputs = X.size[1] #2377

## 2 hidden layers
model = nn.Sequential(
    nn.Linear(n_inputs, 1024),
    nn.ReLU(),
    nn.Linear(1024, 512),
    nn.ReLU(),
    nn.Linear(512, 356),
    nn.ReLU(),
    nn.Linear(356, batch_size),
    nn.ReLU(),
)

And then I do the training in batches

for epoch in range(5):
    totalloss = 0  
    permutation = torch.randperm(X.size()[0])
    for i in range(0, X.size()[0], batch_size):
        optimizer.zero_grad()
        indices = permutation[i:i+batch_size]
        batch_x, batch_y = x[indices], y[indices]

        ypred = model(batch_x)
        loss = criterion(ypred, batch_y) 
        totalloss += loss.item()
        
        ## update the weights
        loss.backward()
        optimizer.step()

Now the problem is that my NN always outputs 100 values but the last batch size can vary.
In fact, if i choose 100 as batch size the last batch will be made of 77 examples (6277%100).

I am sure there is a way around this problem, and that there is a mistake in my structure, but i cannot see it.

Can you help me generalize the training in batch to work with any number of examples and batch size?


Solution

  • I don't know why the other answer was accepted, as there is a fundamental misunderstanding in the way you constructed the model in the question! And the other answer does not take this into account.
    When you define a model and its input and output sizes, you still only consider one sample. You don't use batch_size for scaling the output. When you then give a batch of input data into the model, PyTorch handles the batch internally and the model gets evaulated on each sample in parallel.

    You can look at an official PyTorch tutorial, where they built a model for data on the Fashion MNIST dataset. Each image in this dataset is (28x28x1) pixels (greyscale), and there are 10 different classes to predict. Notice the first and last layer:

    nn.Linear(28*28, 512)
    ....
    nn.Linear(512, 10)

    where the input is the image pixels 28*28 and the output is 10 numbers for 10 classes. You can then use SoftMax or categorical_crossentropy for prediction. There is no information on batch sizes in the model itself, as the model doesn't need that.

    Most of the time it is no problem to have the last batch a bit smaller than the others. If your batch size is 32, but the last batch is only 15 samples, the model will just get 15 samples and labels, do the prediction and compare the 15 results to the 15 labels for the last batch.
    If for some reason you need all batches to be exactly the same size (e.g. for a stateful LSTM), then you can use DataLoader with drop_last=True. But most of the time, it is not needed and you just hide data from the model if you use it.

    Using the DataLoader is still a good idea, because they can efficiently handle loading your data on CPU, while the model will train on GPU.