Using Pandas Im trying to aggregate some info of a csv that looks like this
A B C D E
1 12345 0 California 0 5.00
2 12345 0 Florida 0 2.00
3 67898 0 Rhode Island 0 1.00
4 67898 0 North Carolina 0 3.50
5 44444 0 Alaska 0 6.00
6 44444 0 Texas 0 3.50
I want to group by column A, join by commas values on column C , display sum amount of rows that have same value of column A then export to csv
The csv will look like this
A B C
1 12345 California, Florida 7.00
2 67898 Rhode Island,North Carolina 4.50
3 44444 Alaska, Texas 9.50
I have something like the following:
import pandas as pd
df = pd.read_csv('data.csv')
out = df.groupby('A', as_index=False, sort=False).agg({'C':'sum', 'C':'first'})
out.to_csv('out.csv')
You can used named aggregation
in your agg()
call to apply different functions to col C
and E
. For col C
you want to join()
the strings together. For col E
you want to sum
the grouped values together:
df2 = df.groupby('A', as_index=False).agg(C=('C', lambda x: ', '.join(map(str, x))), E=('E', sum))
df2
:
A C E
0 12345 California, Florida 7.0
1 44444 Alaska, Texas 9.5
2 67898 Rhode Island, North Carolina 4.5