Search code examples
python-polars

Polars: add the sum of some columns inside select/with_column call


Update 2024-08-22: this question refers to an old polars version (<0.19) and can now be handled with pl.sum_horizontal

I would like to add a column that is the sum of all columns but some id columns with polars. This can be done using polars.DataFrame.sum(axis=1):

import polars as pl
df = pl.DataFrame(
    {
        "id": [1, 2],
        "cat_a": [2, 7],
        "cat_b": [5, 1],
        "cat_c": [0, 3]
    }
)
df["cat_total"] = df.select(pl.all().exclude("id")).sum(axis=1)
df

However, this really feels like pandas style. I would prefer to be able to have this inside a longer sequence of calls inside a select or with_column call:

# Throws TypeError: sum() got an unexpected keyword argument 'axis'
# because polars.Expr.sum does not support choosing an axis
(df
     # [...]
    .with_column(pl.all().exclude("id").sum(axis=1).alias("cat_total"))
     # [...]
)

How can this be done (without explicitly identifying the column names)?


Solution

  • Update 2024-08-22: this answer refers to an old polars version (<0.19). Since 0.19 the canonical solution would be df.with_columns(pl.sum_horizontal(pl.exclude("id").alias("sum_horizontal"))

    You can use a fold expression, which takes an accumulator: acc, a binary function Fn(acc, Series) -> Series and one or more expression to apply the fold on.

    df.with_columns(
        pl.fold(0, lambda acc, s: acc + s, pl.exclude("id")).alias("sum_horizontal")
    )
    

    This would output:

    shape: (2, 5)
    ┌─────┬───────┬───────┬───────┬────────────────┐
    │ id  ┆ cat_a ┆ cat_b ┆ cat_c ┆ sum_horizontal │
    │ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---            │
    │ i64 ┆ i64   ┆ i64   ┆ i64   ┆ i64            │
    ╞═════╪═══════╪═══════╪═══════╪════════════════╡
    │ 1   ┆ 2     ┆ 5     ┆ 0     ┆ 7              │
    │ 2   ┆ 7     ┆ 1     ┆ 3     ┆ 11             │
    └─────┴───────┴───────┴───────┴────────────────┘