Search code examples
pythondataframeone-hot-encoding

Performing one hot encoding on dataframe with multiple groups


I am new to learning machine learning on datasets in python and am trying to perform one hot encoding on the below dataframe (only shown a snippet)

id country device
100 sg samsung
100 ch galaxy s
200 ab pocophone
200 ee iphone 1
200 my iphone 2

I would like the results to be something like this

id sg ch ab ee my
100 1 1 0 0 0
200 0 0 1 1 1

Would appreciate any advice and help, thank you all!


Solution

  • Use pd.crosstab:

    >>> pd.crosstab(df['id'], df['country'])[df['country']
    country  sg  ch  ab  ee  my
    id
    100       1   1   0   0   0
    200       0   0   1   1   1