I am a newbie in the NN field and I am doing some training with pytorch.
I decided to make a simple vanilla NN.
I used a personal dataset i had with 2377 numerical features and 6277 examples.
My first try was to make the NN predict each single example, so the pseudocode would look like
for i in range(...):
X = ... # features
y = ... # outcome
y_pred = model(X[i])
loss = criterion(y_pred, y)
y_pred.size # [1,1]
y.size # [1,1]
This took about 10 seconds per epoch and i decided to improve it using mini batches.
So i define the batch size at the beginning and the NN in Pytorch is defined like this
batch_size = 30
n_inputs = X.size[1] #2377
## 2 hidden layers
model = nn.Sequential(
nn.Linear(n_inputs, 1024),
nn.ReLU(),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Linear(512, 356),
nn.ReLU(),
nn.Linear(356, batch_size),
nn.ReLU(),
)
And then I do the training in batches
for epoch in range(5):
totalloss = 0
permutation = torch.randperm(X.size()[0])
for i in range(0, X.size()[0], batch_size):
optimizer.zero_grad()
indices = permutation[i:i+batch_size]
batch_x, batch_y = x[indices], y[indices]
ypred = model(batch_x)
loss = criterion(ypred, batch_y)
totalloss += loss.item()
## update the weights
loss.backward()
optimizer.step()
Now the problem is that my NN always outputs 100 values but the last batch size can vary.
In fact, if i choose 100 as batch size the last batch will be made of 77 examples (6277%100).
I am sure there is a way around this problem, and that there is a mistake in my structure, but i cannot see it.
Can you help me generalize the training in batch to work with any number of examples and batch size?
I don't know why the other answer was accepted, as there is a fundamental misunderstanding in the way you constructed the model in the question! And the other answer does not take this into account.
When you define a model and its input and output sizes, you still only consider one sample. You don't use batch_size
for scaling the output. When you then give a batch of input data into the model, PyTorch
handles the batch internally and the model gets evaulated on each sample in parallel.
You can look at an official PyTorch tutorial, where they built a model for data on the Fashion MNIST
dataset. Each image in this dataset is (28x28x1)
pixels (greyscale), and there are 10 different classes to predict. Notice the first and last layer:
nn.Linear(28*28, 512)
....
nn.Linear(512, 10)
where the input is the image pixels 28*28
and the output is 10
numbers for 10 classes. You can then use SoftMax
or categorical_crossentropy
for prediction. There is no information on batch sizes in the model itself, as the model doesn't need that.
Most of the time it is no problem to have the last batch a bit smaller than the others. If your batch size is 32
, but the last batch is only 15
samples, the model will just get 15 samples and labels, do the prediction and compare the 15 results to the 15 labels for the last batch.
If for some reason you need all batches to be exactly the same size (e.g. for a stateful LSTM
), then you can use DataLoader
with drop_last=True
. But most of the time, it is not needed and you just hide data from the model if you use it.
Using the DataLoader
is still a good idea, because they can efficiently handle loading your data on CPU, while the model will train on GPU.