I am new to using the MinMaxScaler
, so please do not bite my head of if this is a very, very simple question. Below, I have the following datatset:
sample_df.head(2)
ID S_LENGTH S_WIDTH P_LENGTH P_WIDTH SPECIES
-------------------------------------------------------------------
1 3.5 2.5 5.6 1.7 VIRGINICA
2 4.5 5.6 3.4 8.7 SETOSA
Therefore, how to I apply normalisation to this dataset using the following code below to all my columns, excluding the ID
and SPECIES
columns?
I basically want to use the preprocessing.MinMaxScaler()
to apply normalisation, so that all the features are in a range of 0 and 1.
This is the code I am using...
min_max = preprocessing.MinMaxScaler()
min_max.fit_transform(sample_df)
...but when I execute it, I get this error:
ValueError: could not convert string to float: 'SETOSA'
Any help on how to accomplish what I want to do is much appreciated!
Also, my sincere apologies if this is a really dumb question, but I am new to this.
Thank you!
EDIT (SHOWING ERROR):
Alternatively, if I do this...
min_max = preprocessing.MinMaxScaler()
min_max.fit_transform(sample_df[['S_LENGTH', 'S_WIDTH']])
sample_df.head(2)
...I get this error:
AttributeError: 'numpy.ndarray' object has no attribute 'sample'
I doubt this will be very helpful but, you can get the numeric
columns with:
num_df = df[[i for i in df.columns if df[i].dtypes != 'O']]
num_df
Out[126]:
ID S_LENGTH S_WIDTH P_LENGTH P_WIDTH
0 1 3.5 2.5 5.6 1.7
1 2 4.5 5.6 3.4 8.7
and then apply the MinMaxScaler
on it:
min_max = preprocessing.MinMaxScaler()
min_max.fit_transform(num_df)
Out[129]:
array([[0., 0., 0., 1., 0.],
[1., 1., 1., 0., 1.]])
EDIT:
Using your df
:
df
Out[162]:
ID S_LENGTH S_WIDTH P_LENGTH P_WIDTH SPECIES
0 1 3.5 2.5 5.6 1.7 VIRGINICA
1 2 4.5 5.6 3.4 8.7 SETOSA
Use the following code:
num_df = min_max.fit_transform(pd.DataFrame((df[[i for i in df.columns if df[i].dtypes != 'O']])))
num_df.columns = [i for i in df.columns if df[i].dtypes != 'O']
cat_df = (df[[i for i in df.columns if df[i].dtypes == 'O']])
res = pd.merge(num_df,cat_df,left_index=True,right_index=True)
which will give you:
print(res)
ID S_LENGTH S_WIDTH P_LENGTH P_WIDTH SPECIES
0 0.0 0.0 0.0 1.0 0.0 VIRGINICA
1 1.0 1.0 1.0 0.0 1.0 SETOSA
Try line by line the code and let me know if this is what you need.