I'm starting with this dataframe:
In [6]: df = pl.DataFrame({'a': [1, 1, 2, 0], 'b': [1, 4, 1, 5]})
In [7]: df
Out[7]:
shape: (4, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 1 ┆ 4 │
│ 2 ┆ 1 │
│ 0 ┆ 5 │
└─────┴─────┘
I'd like to filter on rows where (pl.col('a'), pl.col('b'))
is greater than (1, 2)
, lexicographically. By that, I mean:
'a'
to 1
'b'
to 2
So, for example:
I could only come up with a way to do this using map_rows
:
In [8]: df.filter(df.map_rows(lambda row: (row[0], row[1]) > (1, 2))['map'])
Out[8]:
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 1 │
└─────┴─────┘
Is there a way to do it without map_rows
?
note that this is not the same as just filetering on each column separately:
In [9]: df.filter((pl.col("a") > 1) | (pl.col("b") > 2))
Out[9]:
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 1 │
│ 0 ┆ 5 │
└─────┴─────┘
So as far as I understood you want the normal tuple comparison. So my solution would be then this:
import polars as pl
df = pl.DataFrame({'a': [1, 1, 2], 'b': [1, 4, 1]})
df.filter((pl.col("a") > 1) | ((pl.col("a") == 1) & (pl.col("b") > 2)))
You can extend this in little function:
def lex_greater(x: tuple, column_names: list[str]) -> pl.Expr:
expr_collector = False
equal_expr = True
for i, name in zip(x,column_names):
expr_collector = expr_collector | (equal_expr & (pl.col(name) > i))
equal_expr = (pl.col(name) == i)
return(expr_collector)
df.filter(lex_greater((1, 2), ["a", "b"]))