I'm trying to train the CV-model with standart MNIST-data:
import torch
from torchvision.datasets import MNIST
import torchvision.transforms as transforms
img_transforms = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1305,), (0.3081,))
])
train_dataset = MNIST(root='../mnist_data/',
train=True,
download=True,
transform=img_transforms)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=10,
shuffle=True)
Model is declared as:
import torch.nn as nn
class MNIST_ConvNet(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = ConvLayer(1, 14, 5, activation=nn.Tanh(),
dropout=0.8)
self.conv2 = ConvLayer(14, 7, 5, activation=nn.Tanh(), flatten=True,
dropout=0.8)
self.dense1 = DenseLayer(28 * 28 * 7, 32, activation=nn.Tanh(),
dropout=0.8)
self.dense2 = DenseLayer(32, 10)
def forward(self, x: Tensor) -> Tensor:
assert_dim(x, 4)
x = self.conv1(x)
x = self.conv2(x)
x = self.dense1(x)
x = self.dense2(x)
return x
Then I invoke forward and estimate loss for this model, in accordance with pytorch approach:
import torch.optim as optim
model = MNIST_ConvNet()
for X_batch, y_batch in train_dataloader:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
optimizer.zero_grad()
output = model(X_batch)[0]
loss = nn.CrossEntropyLoss()
loss = loss(output, y_batch)
X_batch has the following content:
tensor([[[[-0.4236, -0.4236, -0.4236, ..., -0.4236, -0.4236, -0.4236],
[-0.4236, -0.4236, -0.4236, ..., -0.4236, -0.4236, -0.4236],
[-0.4236, -0.4236, -0.4236, ..., -0.4236, -0.4236, -0.4236],
...,
And for this line of code "self.loss(output, y_batch)", I receive the following error:
RuntimeError: Expected floating point type for target with class probabilities, got Long
To solve the problem, I tried update data type:
self.model(X_batch.type(torch.FloatTensor))[0]
But this does not working.
Before answering the question please do not construct a new optimizer / loss criterion each iteration of the training. The code should be something like the following:
import torch.optim as optim
model = MNIST_ConvNet()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
loss = nn.CrossEntropyLoss()
for X_batch, y_batch in train_dataloader:
optimizer.zero_grad()
output = model(X_batch)
loss = loss(output, y_batch)
Additionally in the output indexing [0]
seems like a mistake, since you operate on batches, whereas this extracts the prediction of the first batch element.
If this does not solve it yet you might try casting the tensors to float
by doing: y_batch.float()
or output.float()
.
Hope this helps.