Is there a way to replicate the groupby -> transform functionality in Polars?
I use this for normalising over groups (make groups sum to one), i.e.
df['normalised'] = df.groupby(groupcols).transform(lambda x: x/x.sum())
In polars I tried a workaround by adding a column for the sum then dividing by that column:
df.join(
df.group_by(groupcols).agg(pl.col('VOL').sum().alias('VOLSUM'))),
left_on = [groupcols],
right_on = [groupcols]
).with_columns(
(pl.col('VOL') / pl.col('VOLSUM')).alias('VOLNORM')
).drop('VOLSUM')
However, I also need need to compute a cummulative sum within groups:
df['cummulativesum'] = df.groupby(groupcols)[col].transform('cumsum')
Hence, I have two questions:
Thanks in advance!
You can use Window functions
df = pl.DataFrame(dict(GROUP=[1, 2, 2, 1, 3], VOL=[1, 2, 1, 5, 1]))
shape: (5, 2)
┌───────┬─────┐
│ GROUP ┆ VOL │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 2 │
│ 2 ┆ 1 │
│ 1 ┆ 5 │
│ 3 ┆ 1 │
└───────┴─────┘
.with_columns()
and .over()
:
df.with_columns(
VOLNORM = (pl.col.VOL / pl.col.VOL.sum()).over("GROUP"),
VOLCUMSUM = pl.col.VOL.cum_sum().over("GROUP")
)
shape: (5, 4)
┌───────┬─────┬──────────┬───────────┐
│ GROUP ┆ VOL ┆ VOLNORM ┆ VOLCUMSUM │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 ┆ i64 │
╞═══════╪═════╪══════════╪═══════════╡
│ 1 ┆ 1 ┆ 0.166667 ┆ 1 │
│ 2 ┆ 2 ┆ 0.666667 ┆ 2 │
│ 2 ┆ 1 ┆ 0.333333 ┆ 3 │
│ 1 ┆ 5 ┆ 0.833333 ┆ 6 │
│ 3 ┆ 1 ┆ 1.0 ┆ 1 │
└───────┴─────┴──────────┴───────────┘