Search code examples
pythonpandassumpandas-groupby

Sum based on multiple columns with pandas groupby


I want to create a new column that sums up the value column based on groupings of multiple columns. In this example I want to get the sum per ISIN, date and portfolio.

df = pd.DataFrame({"ISIN": ["IS123", "IS123", "UN123", "UN123", "FA123"],
                     "date": ["16", "16", "18", "18", "22"],
                     "portfolio": ["A", "A", "B", "A", "D"],
                     "value": [400, 300, 200, 600, 500]})

Here is the desired output. As you can see, only the first two rows "satisfies" the condition and both rows get the sum of 700. The others will keep their respective value.

df = pd.DataFrame({"ISIN": ["IS123", "IS123", "UN123", "UN123", "FA123"],
                     "date": ["16", "16", "18", "18", "22"],
                     "portfolio": ["A", "A", "B", "A", "D"],
                     "value": [400, 300, 200, 600, 500],
                     "Sum per ISIN, date and portfolio": [700, 700, 200, 600, 500]})

Here is what I have tried, but I am only able to make it work with grouping on one column, for example just ISIN.

df["Sum per ISIN, date and portfolio"] = df["value"].groupby(df["ISIN", "date", "portfolio"]).transform("sum")

Solution

  • Try groupby on the DataFrame instead of the Series (value) then select the column from the grouper:

    df["Sum per ISIN, date and portfolio"] = (
        df.groupby(["ISIN", "date", "portfolio"])["value"].transform("sum")
    )
    
        ISIN date portfolio  value  Sum per ISIN, date and portfolio
    0  IS123   16         A    400                               700
    1  IS123   16         A    300                               700
    2  UN123   18         B    200                               200
    3  UN123   18         A    600                               600
    4  FA123   22         D    500                               500