Search code examples
pythonpandasordinal

Ordinal encoder issues with NaN values


I have a dataframe with blank spaces as missing values, so I have replaced them with NaN values by using a regex. The problem that I have is when I want to use ordinal encoding for replacing categorical values. My code so far is the following:

    x=pd.DataFrame(np.array([30,"lawyer","France",
                             25,"clerk","Italy",
                             22," ","Germany",
                             40,"salesman","EEUU",
                             34,"lawyer"," ",
                             50,"salesman","France"]
                             
            ).reshape(6,3))
    x.columns=["age","job","country"]
    x = x.replace(r'^\s*$', np.nan, regex=True)

    oe=preprocessing.OrdinalEncoder()
    df.job=oe.fit_transform(df["job"].values.reshape(-1,1))

I got the following error:

Input contains NaN

I would like that the job column gets replaced with numbers such as: [1,2,-1,3,1,3].


Solution

  • You can try with factorize, notice here is category start with 0

    x.job.mask(x.job==' ').factorize()[0]
    Out[210]: array([ 0,  1, -1,  2,  0,  2], dtype=int32)