Search code examples
pythonpandasone-hot-encoding

Pandas One-Hot-Encoding with deterministic order


say I have a categorical column in a DataFrame (for example weekday). And I want to encode it to one-hot-encoding. I am using pandas.get_dummies() to do this. But I can't see a way how to make the order deterministic. For example, I have these two dataframes

df1 = pd.DataFrame({'visitors':[220, 240, 180, 210, 220, 260, 270], 'weekday': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']})
df2 = pd.DataFrame({'visitors':[240, 180, 210, 220, 260, 270, 220], 'weekday': ['Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'Mon']})

And if I call get_dummies() on both, I get two different encodings. I can see that, as the order is different. But is there a way, how I can have any DataFrame have the same encoding, regardless of how the values come in? So for example: Mon=1000000, Tue=0100000, etc?

pd.get_dummies(df1['weekday'])
pd.get_dummies(df2['weekday'])

Solution

  • The behavior of get_dummies is consistent. You're seeing the difference because the position of visitors has changed.

    print(df1)
    print(pd.get_dummies(df1['weekday']))
    
    print()
    
    print(df2)
    print(pd.get_dummies(df2['weekday']))
    

    Here's the output:

       visitors weekday
    0       220     Mon
    1       240     Tue
    2       180     Wed
    3       210     Thu
    4       220     Fri
    5       260     Sat
    6       270     Sun
       Fri  Mon  Sat  Sun  Thu  Tue  Wed
    0    0    1    0    0    0    0    0
    1    0    0    0    0    0    1    0
    2    0    0    0    0    0    0    1
    3    0    0    0    0    1    0    0
    4    1    0    0    0    0    0    0
    5    0    0    1    0    0    0    0
    6    0    0    0    1    0    0    0
    
       visitors weekday
    0       240     Tue
    1       180     Wed
    2       210     Thu
    3       220     Fri
    4       260     Sat
    5       270     Sun
    6       220     Mon
       Fri  Mon  Sat  Sun  Thu  Tue  Wed
    0    0    0    0    0    0    1    0
    1    0    0    0    0    0    0    1
    2    0    0    0    0    1    0    0
    3    1    0    0    0    0    0    0
    4    0    0    1    0    0    0    0
    5    0    0    0    1    0    0    0
    6    0    1    0    0    0    0    0