My original dataframe is similar to the one below:
df= pd.DataFrame({'Variation' : ['A']*5 + ['B']*3 + ['A']*4,
'id': [11]*4 + [12] + [15]*2 + [17] + [20]*4,
'steps' : ['start','step1','step2','end','end','step1','step2','step1','start','step1','step2','end']})
I wanted to create a pivot table from this dataframe for which I have used the below mentioned code:
df1=df.pivot_table(index=['Variation'], columns=['steps'],
values='id', aggfunc='count', fill_value=0)
However, I also wanted to look at the total distinct count of the id's as well. Can someone please let me know how to achieve this? My expected output should be:
| Variation | Total id | Total start | Total step1 | Total step2 | Total end |
|-----------|----------|-------------|-------------|-------------|-----------|
| A | 3 | 2 | 2 | 2 | 3 |
| B | 2 | 0 | 2 | 1 | 0 |
df1 = df1.join(df.groupby('Variation')['id'].nunique().rename('Total id'))
print(df1)
end start step1 step2 Total id
Variation
A 3 2 2 2 3
B 0 0 2 1 2
If need column after Variation
:
c = ['id'] + df['steps'].unique().tolist()
df1 = (df1.join(df.groupby('Variation')['id'].nunique())
.reindex(columns=c)
.add_prefix('Total ')
.reset_index()
.rename_axis(None, axis=1))
print(df1)
Variation Total id Total start Total step1 Total step2 Total end
0 A 3 2 2 2 3
1 B 2 0 2 1 0