Search code examples
pythonpython-polars

Dividing a data frame group-wise


I'd like to divide each group in a polars dataframe by its 50% quantile.

Not working code:

df.select(pl.col('Value')) / df.group_by('Group').quantile(.5, 'linear')

With the following dataframe

df = pl.DataFrame(
    [
        ["A", "A", "A", "A", "B", "B", "B", "B"],
        [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
    ],
    schema=["Group", "Value"],
)

I'd expect the following result

shape: (8, 2)
┌───────┬──────────┐
│ Group ┆ Value    │
│ ---   ┆ ---      │
│ str   ┆ f64      │
╞═══════╪══════════╡
│ A     ┆ 0.4      │
│ A     ┆ 0.8      │
│ A     ┆ 1.2      │
│ A     ┆ 1.6      │
│ B     ┆ 0.769231 │
│ B     ┆ 0.923077 │
│ B     ┆ 1.076923 │
│ B     ┆ 1.230769 │
└───────┴──────────┘

I'm also happy with a series as a result, as long as I can concat it back into the original dataframe again.


Solution

  • You can use window function with over("Group") instead of group_by

    quantile = pl.col("Value").quantile(.5, 'linear').over("Group")
    
    df.with_columns(
        pl.col('Value') / quantile
    )
    
    ┌───────┬──────────┐
    │ Group ┆ Value    │
    │ ---   ┆ ---      │
    │ str   ┆ f64      │
    ╞═══════╪══════════╡
    │ A     ┆ 0.4      │
    │ A     ┆ 0.8      │
    │ A     ┆ 1.2      │
    │ A     ┆ 1.6      │
    │ B     ┆ 0.769231 │
    │ B     ┆ 0.923077 │
    │ B     ┆ 1.076923 │
    │ B     ┆ 1.230769 │
    └───────┴──────────┘