MLPRegressor working but results don't make any sense

I am building a neural network with my research data in two ways: with a statistical programm (SPSS) and with python. I am using the scikit learn MLPRegressor. The problem I have is that whereas my code is , apparently, well written (because it runs), the results do not make sense. The r2score should be around 0.70 ( it is-4147.64) and the correlation represented in the graph should be almost linear. (it is just a straight line at a constant distance from X axis). Also the x and y axis should have values ranging from 0 to 180, which is not the case ( X from 20 to 100, y from -4100 to -3500)

If any of you can give a hand I would really appreciate it. Thank you!!!!!!

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn import neighbors, datasets, preprocessing 
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import r2_score

vhdata = pd.read_csv('vhrawdata.csv')
vhdata.head()

X = vhdata[['PA NH4', 'PH NH4', 'PA K', 'PH K', 'PA NH4 + PA K', 'PH NH4 + PH K', 'PA IS', 'PH IS']]
y = vhdata['PMI']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

from sklearn.preprocessing import Normalizer
scaler = Normalizer().fit(X_train)
X_train_norm = scaler.transform(X_train)
X_test_norm = scaler.transform(X_test)

nnref = MLPRegressor(hidden_layer_sizes = [4], activation = 'logistic', solver = 'sgd', alpha = 1, 
                     learning_rate= 'constant', learning_rate_init= 0.6, max_iter=40000, momentum= 
                     0.3).fit(X_train, y_train)

y_predictions= nnref.predict(X_test)

print('Accuracy of NN classifier on training set (R2 score): {:.2f}'.format(nnref.score(X_train_norm, y_train)))
print('Accuracy of NN classifier on test set (R2 score): {:.2f}'.format(nnref.score(X_test_norm, y_test)))

plt.figure()
plt.scatter(y_test,y_predictions, marker = 'o', color='red')
plt.xlabel('PMI expected (hrs)')
plt.ylabel('PMI predicted (hrs)')
plt.title('Correlation of PMI predicted by MLP regressor and the actual PMI')
plt.show()

Solution

You have a couple of issues. First, it is important to use the right scaler or normalization to work with an MLP. NNs work best between 0 and 1, so consider using sklearn's MinMaxScaler to accomplish this.

So:

from sklearn.preprocessing import Normalizer
scaler = Normalizer().fit(X_train)
X_train_norm = scaler.transform(X_train)
X_test_norm = scaler.transform(X_test)

Should be:

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train_norm = scaler.fit_transform(X_train)
X_test_norm = scaler.fit_transform(X_test)

Next, you are training and testing on the unscaled data, but then performing your scores on the scaled data. Meaning:

nnref = MLPRegressor(hidden_layer_sizes = [4], activation = 'logistic', solver = 'sgd', alpha = 1, 
                     learning_rate= 'constant', learning_rate_init= 0.6, max_iter=40000, momentum= 
                     0.3).fit(X_train, y_train)

should be:

nnref = MLPRegressor(hidden_layer_sizes = [4], activation = 'logistic', solver = 'sgd', alpha = 1, 
                     learning_rate= 'constant', learning_rate_init= 0.6, max_iter=40000, momentum= 
                     0.3).fit(X_train_norm , y_train)

And...

y_predictions= nnref.predict(X_test)

Should be:

y_predictions= nnref.predict(X_test_norm)

Additional notes...

It doesn't make any sense to predict on your training data. That provides no value, as it is testing the same data it learned from and should predict 100%. That is an example of overfitting.