Search code examples
pythonpython-polars

How can I perform operations between a list and scalar column in polars


In python polars, I was wondering if it will be possible to use .eval() to perform an operation between an element and a column. For example, given the following dataframe:

import polars as pl

df = pl.DataFrame({"list": [[2, 2, 2], [3, 3, 3]], "scalar": [1, 2]})

Is it possible to subtract each element of the list column by the value of scalar column? i.e. from this

shape: (2, 2)
┌───────────┬────────┐
│ list      ┆ scalar │
│ ---       ┆ ---    │
│ list[i64] ┆ i64    │
╞═══════════╪════════╡
│ [2, 2, 2] ┆ 1      │
│ [3, 3, 3] ┆ 2      │
└───────────┴────────┘

to this

shape: (2, 3)
┌───────────┬────────┬───────────┐
│ list      ┆ scalar ┆ diff      │
│ ---       ┆ ---    ┆ ---       │
│ list[i64] ┆ i64    ┆ list[i64] │
╞═══════════╪════════╪═══════════╡
│ [2, 2, 2] ┆ 1      ┆ [1, 1, 1] │
│ [3, 3, 3] ┆ 2      ┆ [1, 1, 1] │
└───────────┴────────┴───────────┘

Solution

  • I think that native functionality for this is on the roadmap (see this github issue https://github.com/pola-rs/polars/issues/8006) but you can do this as follows:

    df = df.with_row_count().pipe(
        lambda df: df.join(
            df.explode("list")
            .with_columns(sub=pl.col("list") - pl.col("scalar"))
            .groupby("row_nr")
            .agg(pl.col("sub")),
            on="row_nr",
        )
    )
    

    Basically, I add a row_nr column to have a unique ID for each row. Then I pipe so I can use this row_nr column in further operations. I do a join to add the arithmetic column. In the join I explode the list column to get it as rows, do the arithmetic then do a groupby to gather things back into a list for each row and join this new column back to the df.

    I'm sure there are other ways to do it but this should get you going