I can't make any NN work in Pytorch. What am I doing wrong?

I work with data and I have decent skills in python, I know how to work with different models but never before I tried to use Neural Networks.
So I am new to pytorch and I decided to train using online tutorials and videos.

Unfortunately I discovered I really can't make these models work and I get extremely wrong results. This happens regardless of the guide I follow so it's definitely something I am doing wrong.

For example I followed this step-by-step guide on how to make a NN for regression using the Boston Housing Dataset.

Here is my code that I basically copied from the guide, so there should not be any difference.

import torch
from torch import nn
from import DataLoader
from sklearn.preprocessing import StandardScaler
import pandas as pd

### importing the dataset
boston = pd.read_csv('./housing.csv', header=None, sep='\s+')
boston.columns = [
xcol = boston.drop(columns=['MEDV']).columns
ycol = ['MEDV']

X = boston[xcol].values
y = boston[ycol].values

### Creating the Torch Dataset
class TorchDataset(
    def __init__(self, X, y, scale_data=True):
        if not torch.is_tensor(X) and not torch.is_tensor(y):
            if scale_data:
                X = StandardScaler().fit_transform(X)
            self.X = torch.from_numpy(X)
            self.y = torch.from_numpy(y)

    def __len__(self):
        return len(self.X)
    def __getitem__(self, i):
        return self.X[i], self.y[i]

### building the MLP
class MLP(nn.Module):
    def __init__(self):
        self.layers = nn.Sequential(
            nn.Linear(13, 64),
            nn.Linear(64, 32),
            nn.Linear(32, 1)
    def forward(self, x):
        return self.layers(x)


dataset = TorchDataset(X, y)
trainloader = DataLoader(dataset, batch_size=10, shuffle=True, num_workers=0)

mlp = MLP()

loss_function = nn.L1Loss()
optimizer = torch.optim.Adam(mlp.parameters(), lr=0.001)

### training loop
loss_vec = []

for epoch in range(1000):
    epoch_loss = 0

    for i, data in enumerate(trainloader, 0):
        inputs, targets = data
        inputs, targets = inputs.float(), targets.float()
        targets = targets.reshape((targets.shape[0], 1))

        ## Zero the gradient
        ## Forward Pass
        outputs = mlp(inputs)

        ## compute loss
        loss = loss_function(outputs, targets)

        ## backward pass
        ## Optimization
        ## Statisitcs
        epoch_loss += loss.item()

## Visualizing the Loss curve
import as px

## Checking the R2 score between observed and predicted values    
from sklearn.metrics import r2_score

y_pred = mlp(torch.tensor(X, dtype=torch.float)).detach().numpy()

r2_score(y.flatten(), y_pred.flatten())   ##always a big negative number

Here is the loss plot
But the weirdest part is the predicted values


As you can see my NN is predicting values completely out of the range.

Can you tell me what I am doing wrong here?


  • You train your NN on scaled inputs, as the default value of scale_data in the constructor of TorchDataset is True.

    But you do not scale the inputs when you evaluate, as you simply pass a Tensor rather than a DataLoader from the Dataset. This is the reason for the results you're seeing.

    Also: This is not the question you asked, but you should separate into train, validation, and test set, rather than testing on the training set.

    To predict, use the DataLoader:

    with torch.no_grad():
        y_pred = torch.stack([mlp(batch) for batch, _ in trainloader])