python pandas csv dataframe export-to-csv

How to use an aggregate function on two columns in Pandas without messing up the index?

I'm having this issue where using agg(['sum', 'count']) is outputting "subcolumns" rather than new, exclusive columns.

This is a problem because I need to export the results to a csv for salesforce upload, and the subcolumns are messing up the formatting.

So, first there is this dataframe:

Order ID | Company ID | Company Name | Product   | Date    | Units in order | Units delivered | Platform
1          J01          John's        Bubblegum   01-01-20     100             100            On Prem
2          J01          John's        Bubblegum   01-01-20     200             150            On Prem
3          M01          Mary's        Jawbreakers  01-01-20    100             100            Online

After that, I create a subset DF from it

Subset = df[['Company ID', 'Company Name', 'Platform', 'Date', 'Units in order', 'Units delivered']]

Then I proceed to group and apply aggregate on the subset to get the sum of the units columns and the count (to know how many orders were made)

Subset = Subset.groupby(['Company ID', 'Company Name', 'Platform', 'Date'])['Units in order', 'Units delivered'].agg(['sum', 'count']).reset_index()

However, this messes up the indexes of the columns. Instead of outputting

Company ID | Company Name | Platform | Date | Sum of UIO | Count of UIO | Sum of UD | Count of UD

It's outputting

Company ID | Company Name | Platform | Date | Sum of UIO | Units in order | Units delivered |       
                                                            sum | count      sum | count

Effectively creating 4 "subcolumns". Of course when exporting to CSV, this creates a "format" that is not properly read by anything other than pandas.

How can this issue be solved and why is it happening? At first I thought it was because there was a naming conflict since I'd have 2 sum and 2 count columns, but I tried to rename them and it didnt fix the issue.

Solution

You can solve this issue by using pandas NamedAgg which is available since pandas 0.25.0 and python 3.5.0 if not mistaken. The code would be as follows:

Subset = Subset.groupby(['Company ID', 'Company Name', 'Platform', 'Date']).agg(
   Units_in_order_sum = pd.NamedAgg(column='Units in order', aggfunc='sum'),
   Units_in_order_count = pd.NamedAgg(column='Units in order', aggfunc='count'),
   Units_delivered_sum = pd.NamedAgg(column='Units delivered',aggfunc='sum'),
   Units_delivered_count = pd.NamedAgg(column='Units delivered',aggfunc='count')).reset_index()