Search code examples
pythonpandasdataframemachine-learninglabel-encoding

How does Label Encoder assigns the same number?


I have the column in my data frame

city 

London
Paris
New York 
.
.

I am label encoding the column and it assigns the 0 to London , 1 to Paris and 2 to New York . But when I pass single value for predictions from model I gives city name New York and it assigns the 0 to it . How it shall remains same , I want that if New York values assigns 2 by label encoder in training phase, it should assign 2 again at the predictions .

Code
from sklearn.preprocessing import LabelEncoder
labelencoder=LabelEncoder()
df['city']=labelencoder.fit_transform(df['city'])

Solution

  • You need to use fit or fit_transform to fit the encoder, then transform on the data that you want to encode to get labels (if you do fit_transform on that data, it will re-fit the encoder, and if you only pass one value, it will be encoded as 0):

    df['label'] = labelencoder.fit_transform(df['city'])
    # df
    #        city  label
    # 0    London      0
    # 1     Paris      2
    # 2  New York      1
    
    labelencoder.transform(['New York'])
    # array([1])