Search code examples
pythonpandasmaxminminmax

How to apply Normalisation using the MinMaxScaler() to all Columns, but Exclude the Categorical?


I am new to using the MinMaxScaler, so please do not bite my head of if this is a very, very simple question. Below, I have the following datatset:

sample_df.head(2)

ID     S_LENGTH     S_WIDTH     P_LENGTH     P_WIDTH     SPECIES
-------------------------------------------------------------------
1      3.5          2.5          5.6         1.7        VIRGINICA
2      4.5          5.6          3.4         8.7         SETOSA

Therefore, how to I apply normalisation to this dataset using the following code below to all my columns, excluding the ID and SPECIES columns?

I basically want to use the preprocessing.MinMaxScaler() to apply normalisation, so that all the features are in a range of 0 and 1.

This is the code I am using...

min_max = preprocessing.MinMaxScaler()
min_max.fit_transform(sample_df)

...but when I execute it, I get this error:

ValueError: could not convert string to float: 'SETOSA'

Any help on how to accomplish what I want to do is much appreciated!

Also, my sincere apologies if this is a really dumb question, but I am new to this.

Thank you!

EDIT (SHOWING ERROR):

Alternatively, if I do this...

min_max = preprocessing.MinMaxScaler()
min_max.fit_transform(sample_df[['S_LENGTH', 'S_WIDTH']])

sample_df.head(2)

...I get this error:

AttributeError: 'numpy.ndarray' object has no attribute 'sample'

Solution

  • I doubt this will be very helpful but, you can get the numeric columns with:

    num_df = df[[i for i in df.columns if df[i].dtypes != 'O']]
    
    num_df
    Out[126]: 
       ID  S_LENGTH  S_WIDTH  P_LENGTH  P_WIDTH
    0   1       3.5      2.5       5.6      1.7
    1   2       4.5      5.6       3.4      8.7
    

    and then apply the MinMaxScaler on it:

    min_max = preprocessing.MinMaxScaler()
    min_max.fit_transform(num_df)
    
    Out[129]:
    array([[0., 0., 0., 1., 0.],
           [1., 1., 1., 0., 1.]])
    

    EDIT: Using your df:

    df
    Out[162]: 
       ID  S_LENGTH  S_WIDTH  P_LENGTH  P_WIDTH    SPECIES
    0   1       3.5      2.5       5.6      1.7  VIRGINICA
    1   2       4.5      5.6       3.4      8.7     SETOSA
    

    Use the following code:

    num_df = min_max.fit_transform(pd.DataFrame((df[[i for i in df.columns if df[i].dtypes != 'O']])))
    num_df.columns = [i for i in df.columns if df[i].dtypes != 'O']
    cat_df = (df[[i for i in df.columns if df[i].dtypes == 'O']])
    res = pd.merge(num_df,cat_df,left_index=True,right_index=True)
    

    which will give you:

    print(res)
    
        ID  S_LENGTH  S_WIDTH  P_LENGTH  P_WIDTH    SPECIES
    0  0.0       0.0      0.0       1.0      0.0  VIRGINICA
    1  1.0       1.0      1.0       0.0      1.0     SETOSA
    

    Try line by line the code and let me know if this is what you need.