I am having trouble grouping my df by ZIP Code, and other answers I've found don't seem to work for my Dataframe, or I am not good enough to adjust the code for my needs.
I have a table that looks like the following:
ID Name City ZIP LAT LNG Sum1
0 100 Muffin Parkwoods 99101 48 117 100
1 101 Cake Victoria Village 12512 41 74 250
2 102 Donut Parkwoods 99101 48 117 150
3 103 Milk Victoria Village 12512 41 74 75
4 104 Cookies Wharf 44101 41 81 25
5 105 Candy Wharf 44101 41 81 115
I am hoping to create the following output:
ZIP ID Name City LAT LNG Sum1
0 99101 100, 102 Muffin, Donut Parkwoods 48 117 250
1 12512 101, 103 Cake, Milk Victoria Village 41 74 325
2 44101 104, 105 Cookies Candy Wharf 48 117 140
I am thinking that I could apply this process to only the columns I need combined and then merge it back with the full data after the fact.
For example, ZIP will give me the city, lat, and lng columns so I do not need to worry about these in my grouping.
The issue I am having is combining the string columns (such as ID and name) together, and then summing the float columns. Any help would be greatly appreciated.
You can use .groupby()
with .agg()
, predefining a lambda for the desired string operation:
string_lambda = lambda x: ", ".join(map(str, x))
df = df.groupby("ZIP").agg({
"ID": string_lambda,
"Name": string_lambda,
"City": "first",
"LAT": "first",
"LNG": "first",
"Sum1": "sum"
})
print(df)
This outputs:
ID Name City LAT LNG Sum1
ZIP
12512 101, 103 Cake, Milk Victoria Village 41 74 325
44101 104, 105 Cookies, Candy Wharf 41 81 140
99101 100, 102 Muffin, Donut Parkwoods 48 117 250