In python polars, I was wondering if it will be possible to use .eval()
to perform an operation between an element and a column. For example, given the following dataframe:
import polars as pl
df = pl.DataFrame({"list": [[2, 2, 2], [3, 3, 3]], "scalar": [1, 2]})
Is it possible to subtract each element of the list
column by the value of scalar
column?
i.e. from this
shape: (2, 2)
┌───────────┬────────┐
│ list ┆ scalar │
│ --- ┆ --- │
│ list[i64] ┆ i64 │
╞═══════════╪════════╡
│ [2, 2, 2] ┆ 1 │
│ [3, 3, 3] ┆ 2 │
└───────────┴────────┘
to this
shape: (2, 3)
┌───────────┬────────┬───────────┐
│ list ┆ scalar ┆ diff │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ list[i64] │
╞═══════════╪════════╪═══════════╡
│ [2, 2, 2] ┆ 1 ┆ [1, 1, 1] │
│ [3, 3, 3] ┆ 2 ┆ [1, 1, 1] │
└───────────┴────────┴───────────┘
I think that native functionality for this is on the roadmap (see this github issue https://github.com/pola-rs/polars/issues/8006) but you can do this as follows:
df = df.with_row_count().pipe(
lambda df: df.join(
df.explode("list")
.with_columns(sub=pl.col("list") - pl.col("scalar"))
.groupby("row_nr")
.agg(pl.col("sub")),
on="row_nr",
)
)
Basically, I add a row_nr
column to have a unique ID for each row. Then I pipe so I can use this row_nr column in further operations. I do a join to add the arithmetic column. In the join I explode the list column to get it as rows, do the arithmetic then do a groupby to gather things back into a list for each row and join this new column back to the df.
I'm sure there are other ways to do it but this should get you going