I am trying to add a column using map_rows
in polars. The equivalent of pandas
is as follows:
import pandas as pd
df = pd.DataFrame({"ref": [-1, 2, 8], "v1": [-1, 5, 0], "v2": [-1, 5, 8]})
df['count'] = df.apply(lambda r: len([i for i in r if i == r[0]]) - 1, axis=1)
df = df.drop('ref', axis=1)
df
v1 v2 count
0 -1 -1 2
1 5 5 0
2 0 8 1
The following is the sample code that I have with polars. Though it works as desired, it looks ugly and probably can be improved as well.
import polars as pl
df = pl.DataFrame({"ref": [-1, 2, 8], "v1": [-1, 5, 0], "v2": [-1, 5, 8]})
x = df.map_rows(lambda r: len([i for i in r if i == r[0]]) - 1).rename({'map': 'count'})
df = df.hstack([x.to_series()]).drop('ref')
df
shape: (3, 3)
┌─────┬─────┬───────┐
│ v1 ┆ v2 ┆ count │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═══════╡
│ -1 ┆ -1 ┆ 2 │
│ 5 ┆ 5 ┆ 0 │
│ 0 ┆ 8 ┆ 1 │
└─────┴─────┴───────┘
What bothers me is the rename
part and hstack
that I clobbered together to work.
I would be grateful for any improvements in the above code.
TIA
The idea is to use Polars Expressions instead of applying custom Python functions/lambdas.
It looks like you're trying to count when ref
and another column have the same value?
df.select(pl.exclude("ref") == pl.col("ref"))
shape: (3, 2)
┌───────┬───────┐
│ v1 ┆ v2 │
│ --- ┆ --- │
│ bool ┆ bool │
╞═══════╪═══════╡
│ true ┆ true │
│ false ┆ false │
│ false ┆ true │
└───────┴───────┘
.sum_horizontal()
can be used to get a "count" of the true values on each row.
df.with_columns(count = pl.sum_horizontal(pl.exclude("ref") == pl.col("ref")))
shape: (3, 4)
┌─────┬─────┬─────┬───────┐
│ ref ┆ v1 ┆ v2 ┆ count │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ u32 │
╞═════╪═════╪═════╪═══════╡
│ -1 ┆ -1 ┆ -1 ┆ 2 │
│ 2 ┆ 5 ┆ 5 ┆ 0 │
│ 8 ┆ 0 ┆ 8 ┆ 1 │
└─────┴─────┴─────┴───────┘