Search code examples
pythonpandasone-hot-encoding

Ordinal encoding in Pandas


Is there a way to have pandas.get_dummies output the numerical representation in one column rather than a separate column for each option?

Concretely, currently when using pandas.get_dummies it gives me a column for every option:

Size Size_Big Size_Medium Size_Small
Big 1 0 0
Medium 0 1 0
Small 0 0 1

But I'm looking for more of the following output:

Size Size_Numerical
Big 1
Medium 2
Small 3

Solution

  • You don't want dummies, you want factors/categories.

    Use pandas.factorize:

    df['Size_Numerical'] = pd.factorize(df['Size'])[0] + 1
    

    output:

         Size  Size_Numerical
    0     Big               1
    1  Medium               2
    2   Small               3