I am new to learning machine learning on datasets in python
and am trying to perform one hot encoding on the below dataframe (only shown a snippet)
id | country | device |
---|---|---|
100 | sg | samsung |
100 | ch | galaxy s |
200 | ab | pocophone |
200 | ee | iphone 1 |
200 | my | iphone 2 |
I would like the results to be something like this
id | sg | ch | ab | ee | my |
---|---|---|---|---|---|
100 | 1 | 1 | 0 | 0 | 0 |
200 | 0 | 0 | 1 | 1 | 1 |
Would appreciate any advice and help, thank you all!
Use pd.crosstab
:
>>> pd.crosstab(df['id'], df['country'])[df['country']
country sg ch ab ee my
id
100 1 1 0 0 0
200 0 0 1 1 1