Suppose I have the following DataFrame:
df = pd.DataFrame({
'year': [2015, 2015, 2018, 2018, 2020],
'total': [100, 200, 50, 150, 400],
'tax': [10, 20, 5, 15, 40]})
year total tax
0 2015 100 10
1 2015 200 20
2 2018 50 5
3 2018 150 15
4 2020 400 40
I want to sum up the total and tax columns by year and obtain the size
at the same time.
The following code gives me the sum of the two columns:
df_total_tax = df.groupby('year', as_index=False)[['total', 'tax']].apply(np.sum)
However, I can't figure out how to also include a column for size
at the same time. Must I perform a different groupby
, then use .size()
and then append that column to df_total_tax
? Or is there an easier way?
The end result would look like this:
year total tax size
0 2015 300 30 2
1 2018 200 20 2
2 2020 400 40 1
You can specify for each column separately aggregate function in named aggregation:
df.groupby('year', as_index=False).agg(total=('total', 'sum'),
tax=('tax', 'sum'),
size=('tax', 'size'))
year total tax size
0 2015 300 30 2
1 2018 200 20 2
2 2020 400 40 1