Search code examples
pythonpandaspython-polars

Polars Replacing Values Greater than the Max of Another Polars DataFrame Within Groups


I have 2 DataFrames:

import polars as pl

df1 = pl.DataFrame(
    {
        "group": ["A", "A", "A", "B", "B", "B"],
        "index": [1, 3, 5, 1, 3, 8],
    }
)

df2 = pl.DataFrame(
    {
        "group": ["A", "A", "A", "B", "B", "B"],
        "index": [3, 4, 7, 2, 7, 10],
    }
)

I want to cap the index in df2 using the largest index of each group in df1. The groups in two DataFrames are the same.

expected output for df2:

shape: (6, 2)
┌───────┬───────┐
│ group ┆ index │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ A     ┆ 3     │
│ A     ┆ 4     │
│ A     ┆ 5     │
│ B     ┆ 2     │
│ B     ┆ 7     │
│ B     ┆ 8     │
└───────┴───────┘

Solution

  • You can compute the max per group over df1, then clip df2:

    out = df2.with_columns(
        pl.col('index').clip(
            upper_bound=df1.select(pl.col('index').max().over('group'))['index']
        )
    )
    
    

    Output:

    shape: (6, 2)
    ┌───────┬───────┐
    │ group ┆ index │
    │ ---   ┆ ---   │
    │ str   ┆ i64   │
    ╞═══════╪═══════╡
    │ A     ┆ 3     │
    │ A     ┆ 4     │
    │ A     ┆ 5     │
    │ B     ┆ 2     │
    │ B     ┆ 7     │
    │ B     ┆ 8     │
    └───────┴───────┘
    

    Alternatively, if the two groups are not necessarily the same in both dataframes, you could group_by.max then align with join:

    df1 = pl.DataFrame(
        {
            "group": ["A", "A", "A", "B", "B", "B"],
            "index": [1, 3, 5, 1, 3, 7],
        }
    )
    
    df2 = pl.DataFrame(
        {
            "group": ["A", "A", "A", "B", "B", "B", "B"],
            "index": [3, 4, 7, 2, 7, 8, 9],
        }
    )
    
    out = df2.with_columns(
        pl.col('index').clip(
            upper_bound=df2.join(df1.group_by('group').max(), on='group')['index_right']
        )
    )
    

    Output:

    shape: (7, 2)
    ┌───────┬───────┐
    │ group ┆ index │
    │ ---   ┆ ---   │
    │ str   ┆ i64   │
    ╞═══════╪═══════╡
    │ A     ┆ 3     │
    │ A     ┆ 4     │
    │ A     ┆ 5     │
    │ B     ┆ 2     │
    │ B     ┆ 7     │
    │ B     ┆ 7     │
    │ B     ┆ 7     │
    └───────┴───────┘