I am coding a machine learning algorithm using Keras and I need to normalize my data before feeding it through. I have 3 inputs organised into a 2d array with each column making up an input.
import tensorflow as tf
import keras
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
#Importing all the required modules
raw_data = np.array([]) #Defining numpy array for training data
val_data = np.array([]) #Defining numpy array for validation data
test = np.array([]) #Defining numpy array for test data
rawfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Training.txt'
valfilepath = r'C:\Users\***\Desktop\***\Unprocessed_Data_For_Validation.txt'
testfilepath = r'C:\Users\***\Desktop\***\h4t6usedforprediction.txt' #Filepaths
raw_data = np.loadtxt(rawfilepath)
val_data = np.loadtxt(valfilepath)
test = np.loadtxt(testfilepath) #Loading contents of text files into their respective arrays
X = raw_data[:, 1:4] #Splitting the data, X contains the coordinate position, initial shear and initial
Y = raw_data[:, 0] #Splitting the data, Y contains the measured height
X_Val = val_data[:, 1:4]
Y_Val = val_data[:, 0]
X_test = test[:, 1:4]
Y_test = test[:, 0]
scalar = MinMaxScaler()
#print(X_test)
#print(Y_test)
print(X)
print(Y)
scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X)
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.fit_transform(X_Val)
Yvalnorm = scaler.fit_transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.fit_transform(X_test)
Ytestnorm = scaler.fit_transform(Y_test.reshape(-1,1))
The Y variables are normalising fine however I think the X variables are normalising with the whole array rather than column by column.
These are the inputs that the model is using to make predictions.
X=[0.94941569 0. 0. ], Predicted=[0.02409407]
X=[0.95664225 0. 0. ], Predicted=[0.02374389]
X=[0.93496738 0. 0. ], Predicted=[0.02480936]
X=[0.94219233 0. 0. ], Predicted=[0.02444912]
X=[0.92774402 0. 0. ], Predicted=[0.02517468]
X=[0.92052067 0. 0. ], Predicted=[0.02554525]
X=[0.91329892 0. 0. ], Predicted=[0.02592104]
X=[0.90607877 0. 0. ], Predicted=[0.02630214]
X=[0.89885863 0. 0. ], Predicted=[0.02668868]
X=[0.89163848 0. 0. ], Predicted=[0.02708073]
X=[0.88441994 0. 0. ], Predicted=[0.0274783]
X=[0.87720299 0. 0. ], Predicted=[0.02788144]
Let's do this by part:
1 - If X
and Y
are you train
set, calling fit_transform
in that set is correct. But you can not fit_transform
your validation
and test
sets again. You have to just transform
them using the scaler
you have previously defined:
scaler = MinMaxScaler()
Xnorm = scaler.fit_transform(X)
Ynorm = scaler.fit_transform(Y.reshape(-1,1))
Xvalnorm = scaler.transform(X_Val)
Yvalnorm = scaler.transform(Y_Val.reshape(-1,1))
Xtestnorm = scaler.transform(X_test)
Ytestnorm = scaler.transform(Y_test.reshape(-1,1))
2 - I am assuming the values of X
you have posted at the end are already what you got from the normalization. So, i have created my_X
just to exemplify to use sklearn
to normalize some data:
my_X = np.array([[-3, 2, 4], [-6, 4, 1], [0, 10, 15], [12, 18, 31]])
scaler = MinMaxScaler()
scaler.fit_transform(my_X)
Just change the values my_X
for the values you have in your X
.