Search code examples
python-polars

polars: lexically compare columns


I'm starting with this dataframe:

In [6]: df = pl.DataFrame({'a': [1, 1, 2, 0], 'b': [1, 4, 1, 5]})

In [7]: df
Out[7]:
shape: (4, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 1   │
│ 1   ┆ 4   │
│ 2   ┆ 1   │
│ 0   ┆ 5   │
└─────┴─────┘

I'd like to filter on rows where (pl.col('a'), pl.col('b')) is greater than (1, 2), lexicographically. By that, I mean:

  • first, compare column 'a' to 1
  • then, compare column 'b' to 2

So, for example:

  • (1, 1) < (1, 2)
  • (1, 3) > (1, 2)
  • (2, 1) > (1, 2)

I could only come up with a way to do this using map_rows:


In [8]: df.filter(df.map_rows(lambda row: (row[0], row[1]) > (1, 2))['map'])
Out[8]:
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 2   ┆ 1   │
└─────┴─────┘

Is there a way to do it without map_rows?

EDIT

note that this is not the same as just filetering on each column separately:

In [9]: df.filter((pl.col("a") > 1) | (pl.col("b") > 2))
Out[9]:
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 2   ┆ 1   │
│ 0   ┆ 5   │
└─────┴─────┘

Solution

  • So as far as I understood you want the normal tuple comparison. So my solution would be then this:

    import polars as pl
    
    df = pl.DataFrame({'a': [1, 1, 2], 'b': [1, 4, 1]})
    
    df.filter((pl.col("a") > 1) | ((pl.col("a") == 1) & (pl.col("b") > 2)))
    

    You can extend this in little function:

    def lex_greater(x: tuple, column_names: list[str]) -> pl.Expr:
        expr_collector = False
        equal_expr = True
        for i, name in zip(x,column_names):
            expr_collector = expr_collector | (equal_expr & (pl.col(name) > i))
            equal_expr = (pl.col(name) == i)
        return(expr_collector)
    
    df.filter(lex_greater((1, 2), ["a", "b"]))