I want to create a new column that sums up the value column based on groupings of multiple columns. In this example I want to get the sum per ISIN, date and portfolio.
df = pd.DataFrame({"ISIN": ["IS123", "IS123", "UN123", "UN123", "FA123"],
"date": ["16", "16", "18", "18", "22"],
"portfolio": ["A", "A", "B", "A", "D"],
"value": [400, 300, 200, 600, 500]})
Here is the desired output. As you can see, only the first two rows "satisfies" the condition and both rows get the sum of 700. The others will keep their respective value.
df = pd.DataFrame({"ISIN": ["IS123", "IS123", "UN123", "UN123", "FA123"],
"date": ["16", "16", "18", "18", "22"],
"portfolio": ["A", "A", "B", "A", "D"],
"value": [400, 300, 200, 600, 500],
"Sum per ISIN, date and portfolio": [700, 700, 200, 600, 500]})
Here is what I have tried, but I am only able to make it work with grouping on one column, for example just ISIN.
df["Sum per ISIN, date and portfolio"] = df["value"].groupby(df["ISIN", "date", "portfolio"]).transform("sum")
Try groupby
on the DataFrame instead of the Series (value
) then select the column from the grouper:
df["Sum per ISIN, date and portfolio"] = (
df.groupby(["ISIN", "date", "portfolio"])["value"].transform("sum")
)
ISIN date portfolio value Sum per ISIN, date and portfolio
0 IS123 16 A 400 700
1 IS123 16 A 300 700
2 UN123 18 B 200 200
3 UN123 18 A 600 600
4 FA123 22 D 500 500