I have a df as follows:
with n knows at runtime.
I need to count 1 and -1 values over the rows.
Namely, I need a new df (or new columns in the old one):
You can use polars.sum_horizontal
with an Expression to sum horizontally.
For example, starting with this data
import polars as pl
data_frame = (
pl.DataFrame({
'col0': [1, -1, 1, -1, 1],
'col1': [1, 1, 1, 1, 1],
'col2': [-1, -1, -1, -1, -1],
'col3': [1, -1, -1, 1, 1],
})
)
data_frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ -1 ┆ 1 │
│ -1 ┆ 1 ┆ -1 ┆ -1 │
│ 1 ┆ 1 ┆ -1 ┆ -1 │
│ -1 ┆ 1 ┆ -1 ┆ 1 │
│ 1 ┆ 1 ┆ -1 ┆ 1 │
└──────┴──────┴──────┴──────┘
We can sum all columns horizontally, using polars.all
.
(
data_frame
.with_columns(
pl.sum_horizontal(pl.all() > 0).alias('pos'),
pl.sum_horizontal(pl.all() < 0).alias('neg'),
)
)
shape: (5, 6)
┌──────┬──────┬──────┬──────┬─────┬─────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 ┆ pos ┆ neg │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ u32 ┆ u32 │
╞══════╪══════╪══════╪══════╪═════╪═════╡
│ 1 ┆ 1 ┆ -1 ┆ 1 ┆ 3 ┆ 1 │
│ -1 ┆ 1 ┆ -1 ┆ -1 ┆ 1 ┆ 3 │
│ 1 ┆ 1 ┆ -1 ┆ -1 ┆ 2 ┆ 2 │
│ -1 ┆ 1 ┆ -1 ┆ 1 ┆ 2 ┆ 2 │
│ 1 ┆ 1 ┆ -1 ┆ 1 ┆ 3 ┆ 1 │
└──────┴──────┴──────┴──────┴─────┴─────┘
The above algorithm works because Polars will upcast boolean values to integers when summing. For example, the expression pl.all() > 0
produces Expressions of type boolean.
(
data_frame
.with_columns(
pl.all() > 0
)
)
shape: (5, 4)
┌───────┬──────┬───────┬───────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ bool ┆ bool ┆ bool ┆ bool │
╞═══════╪══════╪═══════╪═══════╡
│ true ┆ true ┆ false ┆ true │
│ false ┆ true ┆ false ┆ false │
│ true ┆ true ┆ false ┆ false │
│ false ┆ true ┆ false ┆ true │
│ true ┆ true ┆ false ┆ true │
└───────┴──────┴───────┴───────┘
polars.sum_horizontal
will then convert these to integers as it sums them horizontally.
For examples of how to select only certain columns (by name, by type, by regex expression, etc...), see this StackOverflow response.