i have a df like this:
id | month |
---|---|
1 | 1 |
1 | 3 |
1 | 4 |
1 | 6 |
i want to transform it become like this:
id | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
1 | 1 | 0 | 1 | 1 | 0 | 1 |
ive tried using this code:
ndf = df[['id']].join(pd.get_dummies(
df['month'])).groupby('id').max()
but it shows like this:
id | 1 | 3 | 4 | 6 |
---|---|---|---|---|
1 | 1 | 1 | 1 | 1 |
how can i insert the middle column (2 and 5) even if it's not in the data?
You can use pd.crosstab
instead, then create new columns using pd.RangeIndex
based on the min and max month
, and finally use DataFrame.reindex
(and optionally DataFrame.reset_index
afterwards):
import pandas as pd
new_cols = pd.RangeIndex(df['month'].min(), df['month'].max())
res = (
pd.crosstab(df['id'], df['month'])
.reindex(columns=new_cols, fill_value=0)
.reset_index()
)
Output:
>>> res
id 1 2 3 4 5
0 1 1 0 1 1 0