Search code examples
pythonlogistic-regressionone-hot-encoding

One Hot Encoding giving nan values in python


I have a classification case study where I am using Logistic Regression model. I want to use One Hot Encoding to convert my categorical column (SalStat) values into 0 and 1. This is my code:

data2["SalStat"] = data2["SalStat"].map({"less than or equal to 50,000":0, "greater than 50,000":1})
print(data2["SalStat"])

Above code does not convert the values to 0 and 1 but instead converts them to nan! Where am I going wrong?

PS: The SalStat column classifies rows as "less than or equal to 50,000" or "greater than 50,000"


Solution

  • I guess it throws error because of the values in SalStat column. It is better to assign them to a variable instead of typing manually.

    val_1 = data2["SalStat"].unique()[0]
    val_2 = data2["SalStat"].unique()[1]
    
    data2["SalStat"] = data2["SalStat"].map({val_1 :0, val_2 :1})
    print(data2["SalStat"])