Search code examples
pythonpandasdataframegroup-by

How to group a DataFrame by a column while combining strings and summing floats


I am having trouble grouping my df by ZIP Code, and other answers I've found don't seem to work for my Dataframe, or I am not good enough to adjust the code for my needs.

I have a table that looks like the following:

    ID      Name       City               ZIP          LAT   LNG     Sum1
0   100     Muffin     Parkwoods          99101        48    117     100
1   101     Cake       Victoria Village   12512        41    74      250
2   102     Donut      Parkwoods          99101        48    117     150
3   103     Milk       Victoria Village   12512        41    74      75
4   104     Cookies    Wharf              44101        41    81      25
5   105     Candy      Wharf              44101        41    81      115

I am hoping to create the following output:

    ZIP     ID          Name              City              LAT   LNG     Sum1
0   99101   100, 102    Muffin, Donut     Parkwoods         48    117     250
1   12512   101, 103    Cake, Milk        Victoria Village  41    74      325
2   44101   104, 105    Cookies Candy     Wharf             48    117     140

I am thinking that I could apply this process to only the columns I need combined and then merge it back with the full data after the fact.

For example, ZIP will give me the city, lat, and lng columns so I do not need to worry about these in my grouping.

The issue I am having is combining the string columns (such as ID and name) together, and then summing the float columns. Any help would be greatly appreciated.


Solution

  • You can use .groupby() with .agg(), predefining a lambda for the desired string operation:

    string_lambda = lambda x: ", ".join(map(str, x))
    df = df.groupby("ZIP").agg({
        "ID": string_lambda,
        "Name": string_lambda,
        "City": "first",
        "LAT": "first",
        "LNG": "first",
        "Sum1": "sum"
    })
    print(df)
    

    This outputs:

                 ID            Name              City  LAT  LNG  Sum1
    ZIP
    12512  101, 103      Cake, Milk  Victoria Village   41   74   325
    44101  104, 105  Cookies, Candy             Wharf   41   81   140
    99101  100, 102   Muffin, Donut         Parkwoods   48  117   250