I have data which is already summed and grouped in a dataframe named df
:
| id | segment | region | points |
|----|---------|----------|--------|
| 90 | Gold | APAC | 21 |
| 90 | Silver | EMEA | 34 |
| 90 | Bronze | AMERICAS | 564 |
| 90 | Gold | EMEA | 3939 |
| 90 | Silver | Americas | 989 |
| 90 | Gold | EMEA | 43 |
| 90 | Silver | APAC | 13 |
| 90 | Bronze | AMERICAS | 567 |
I would like to pivot both segment
and region
to columns and then total the points for those columns. The output would look like the below based on the input above:
| id | Gold | Silver | Bronze | APAC | EMEA | AMERICAS |
|----|------|--------|--------|------|------|----------|
| 90 | 4003 | 1036 | 1131 | 34 | 4016 | 2120 |
What I've tried so far is to convert my dataframe to Pandas and then use the built in pivot_table
function.
import pandas as pd
df_pd = df.toPandas()
pd.pivot_table(df_pd, values = 'points', index=['id'], columns = ['segment', 'region']).reset_index()
The code works but the output isn't as expected. Instead of getting totals with each region
and segment
as a column, I get two rows of columns. Within the two rows of columns, it appears that region
is a subgrouping of segment
. See below (note, numbers don't match due to random numbers being used in the sample data, I'm more concerned about the shape):
Solution with double pivot_table
and DataFrame.join
:
df1 = pd.pivot_table(df_pd, values='points', index='id', columns='segment', aggfunc='sum')
df2 = pd.pivot_table(df_pd, values='points', index='id', columns='region', aggfunc='sum')
df = df1.join(df2).reset_index()
print (df)
id Bronze Gold Silver AMERICAS APAC Americas EMEA
0 90 1131 4003 1036 1131 34 989 4016
In your solution is possible add sum
per first and per second level of MultiIndex in columns
with join
:
df3 = pd.pivot_table(df_pd,
values = 'points',
index='id',
columns = ['segment', 'region'],
aggfunc='sum')
df = df3.sum(level=0, axis=1).join(df3.sum(level=1, axis=1)).reset_index()
print (df)
id Bronze Gold Silver AMERICAS APAC EMEA Americas
0 90 1131 4003 1036 1131 34 4016 989