Search code examples
pandasgroup-by

Pandas Groupby, Join and Sum


Using Pandas Im trying to aggregate some info of a csv that looks like this

       A       B      C             D      E
1     12345    0    California      0      5.00 
2     12345    0    Florida         0      2.00
3     67898    0    Rhode Island    0      1.00
4     67898    0    North Carolina  0      3.50
5     44444    0    Alaska          0      6.00
6     44444    0    Texas           0      3.50

I want to group by column A, join by commas values on column C , display sum amount of rows that have same value of column A then export to csv

The csv will look like this

       A       B                              C      
1     12345    California, Florida            7.00 
2     67898    Rhode Island,North Carolina    4.50
3     44444    Alaska, Texas                  9.50

I have something like the following:

import pandas as pd

df = pd.read_csv('data.csv')

out = df.groupby('A', as_index=False, sort=False).agg({'C':'sum', 'C':'first'})

out.to_csv('out.csv')

Solution

  • You can used named aggregation in your agg() call to apply different functions to col C and E. For col C you want to join() the strings together. For col E you want to sum the grouped values together:

    df2 = df.groupby('A', as_index=False).agg(C=('C', lambda x: ', '.join(map(str, x))), E=('E', sum))
    

    df2:

           A                             C    E
    0  12345           California, Florida  7.0
    1  44444                 Alaska, Texas  9.5
    2  67898  Rhode Island, North Carolina  4.5