Search code examples
python-3.xpandasdataframepivot-tableone-hot-encoding

Dataframe same key with multiple values and One Hot Encoding (Python, Pandas)?


I have the following toy dataframe example:

import pandas as pd
df = pd.DataFrame({'id': [0, 0, 0], 'key':['role', 'role', 'role'], 'val': ['admin', 'local_usr', 'fin_dep_ds']})

As you may see the same key has multiple values. When I pivot the table, the agg function I am using is ",".join() but the final goal is to apply one hot encoding on these values:

pd.DataFrame(pd.pivot_table(df, \
                            values='val', \
                            index='id', \
                            columns='key', \
                            aggfunc=','.join).to_records())

Final goal:

id admin local_usr fin_dep_ds
0  1     1         1

Please advise how can I do it? Any best practice to tackle this situation?


Solution

  • IIUC, .pivot_table() with aggfunc="size" produces your result:

    x = df.pivot_table(index="id", columns="val", aggfunc="size").reset_index()
    x.columns.name = None
    print(x)
    

    Prints:

       id  admin  fin_dep_ds  local_usr
    0   0      1           1          1