Polars Replacing Values of Other groups to the Values of a Certain Group

I have the following Polars.DataFrame:

df = pl.DataFrame(
    {
        "timestamp": [1, 2, 3, 1, 2, 3],
        "var1": [1, 2, 3, 3, 4, 5], 
        "group": ["a", "a", "a", "b", "b", "b"],
        }
    )
print(df)

out:
shape: (6, 3)
┌───────────┬──────┬───────┐
│ timestamp ┆ var1 ┆ group │
│ ---       ┆ ---  ┆ ---   │
│ i64       ┆ i64  ┆ str   │
╞═══════════╪══════╪═══════╡
│ 1         ┆ 1    ┆ a     │
│ 2         ┆ 2    ┆ a     │
│ 3         ┆ 3    ┆ a     │
│ 1         ┆ 3    ┆ b     │
│ 2         ┆ 4    ┆ b     │
│ 3         ┆ 5    ┆ b     │
└───────────┴──────┴───────┘

I want to replace the values of group b with the values of group a that are having the same timestamps.

Desired output:

shape: (6, 3)
┌───────────┬──────┬───────┐
│ timestamp ┆ var1 ┆ group │
│ ---       ┆ ---  ┆ ---   │
│ i64       ┆ i64  ┆ str   │
╞═══════════╪══════╪═══════╡
│ 1         ┆ 1    ┆ a     │
│ 2         ┆ 2    ┆ a     │
│ 3         ┆ 3    ┆ a     │
│ 1         ┆ 1    ┆ b     │
│ 2         ┆ 2    ┆ b     │
│ 3         ┆ 3    ┆ b     │
└───────────┴──────┴───────┘

I have the current solution with generating a helper df:

def group_value_replacer(
    df: pl.DataFrame, 
    target_group_col: str,
    target_var: str,
    target_group: str,
    ):
    
    helper_df = df.filter(pl.col(target_group_col) == target_group)
    df = df.drop(target_var).join(
        helper_df.drop(target_group_col),
        on=["timestamp"],
        how="left",
        )
    
    return df

group_value_replacer(df, "group", "var1", "a")

out：
shape: (6, 3)
┌───────────┬───────┬──────┐
│ timestamp ┆ group ┆ var1 │
│ ---       ┆ ---   ┆ ---  │
│ i64       ┆ str   ┆ i64  │
╞═══════════╪═══════╪══════╡
│ 1         ┆ a     ┆ 1    │
│ 2         ┆ a     ┆ 2    │
│ 3         ┆ a     ┆ 3    │
│ 1         ┆ b     ┆ 1    │
│ 2         ┆ b     ┆ 2    │
│ 3         ┆ b     ┆ 3    │
└───────────┴───────┴──────┘

I want to improve the solution by using Polars.Expr: For example, is there a way for me to achieve the same operation using expressions like df.with_columns(pl.col(target_var).operationxx).

Solution

I think for generic solution your approach with join works fine, you could probably try something like this as well:

filter() to filter var1 column to leave only values where group == a
first() to get the value.
over() to limit it to certain timestamp.
coalesce() to fallback to actual value if value for group == a doesn't exist.

df.with_columns(
    pl.coalesce(
        pl.col.var1.filter(pl.col.group == "a").first().over("timestamp"),
        pl.col.var1
    )
)

┌───────────┬──────┬───────┐
│ timestamp ┆ var1 ┆ group │
│ ---       ┆ ---  ┆ ---   │
│ i64       ┆ i64  ┆ str   │
╞═══════════╪══════╪═══════╡
│ 1         ┆ 1    ┆ a     │
│ 2         ┆ 2    ┆ a     │
│ 3         ┆ 3    ┆ a     │
│ 1         ┆ 1    ┆ b     │
│ 2         ┆ 2    ┆ b     │
│ 3         ┆ 3    ┆ b     │
└───────────┴──────┴───────┘

You can also skip coalesce() if you don't need a fallback.

df.with_columns(
    pl.col.var1.filter(pl.col.group == "a").first().over("timestamp")
)