I have a Polars dataframe and I want to calculate a weighted sum of a particular column and the weights is just the positive integer sequence, e.g., 1, 2, 3, ...
For example, assume I have the following dataframe.
import polars as pl
df = pl.DataFrame({"a": [2, 4, 2, 1, 2, 1, 3, 6, 7, 5]})
The result I want is
218 (= 2*1 + 4*2 + 2*3 + 1*4 + ... + 7*9 + 5*10)
How can I achieve this by using only general polars expressions? (The reason I want to use just polars expressions to solve the problem is for speed considerations)
Note: The example is just a simple example where there are just 10 numbers there, but in general, the dataframe height can be any positive number.
Thanks for your help..
Such weighted sum can be calculated using dot product (.dot()
method). To generate range (weights) from 1 to n, you can use pl.int_range(1, n+1)
.
If you just need to calculate result of weighted sum:
df.select(
pl.col("a").dot(pl.int_range(1, pl.clen()+1))
) #.item() - to get value (218)
Keep dataframe
df.with_columns(
pl.col("a").dot(pl.int_range(1, pl.len()+1)).alias("weighted_sum")
)
┌─────┬──────────────┐
│ a ┆ weighted_sum │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════════╡
│ 2 ┆ 218 │
│ 4 ┆ 218 │
│ ... ┆ ... │
│ 3 ┆ 218 │
│ 5 ┆ 218 │
└─────┴──────────────┘
In group_by
context
df.group_by("some_cat_col", maintain_order=True).agg(
pl.col("a").dot(pl.int_range(1, pl.len()+1))
)