Search code examples
pythonpython-polars

How do you express the identity expression?


How do you express the identity expression in Polars?

By this I mean the expression idexpr that when you do lf.filter(idexpr) you get the entirety of lf.

Similar to SELECT(*) in SQL.

I'm resorting to a logical expression like

idexpr = (pl.col("a") == 0) | (pl.col("a") != 0)

Solution

  • According to Documentation, what's passed to filter as a predicate needs to be an "Expression(s) that evaluates to a boolean Series."

    This you already know, since you are passing a logical expression to circumvent it. Easily enough, there's a very simple expression that always evaluates to true: pl.lit(True) or just True.

    import polars as pl
    
    df = pl.DataFrame({
        "id": [1, 2, 3],
        "name": ["Alice", "Bob", "Charlie"],
        "age": [24, 28, 23],
        "bool": [True, False, True]
    })
    
    print(df.filter(pl.lit(True)))
    

    This gives the output of:

    shape: (3, 4)
    ┌─────┬─────────┬─────┬───────┐
    │ id  ┆ name    ┆ age ┆ bool  │
    │ --- ┆ ---     ┆ --- ┆ ---   │
    │ i64 ┆ str     ┆ i64 ┆ bool  │
    ╞═════╪═════════╪═════╪═══════╡
    │ 1   ┆ Alice   ┆ 24  ┆ true  │
    │ 2   ┆ Bob     ┆ 28  ┆ false │
    │ 3   ┆ Charlie ┆ 23  ┆ true  │
    └─────┴─────────┴─────┴───────┘
    

    Edit:

    Careful though, I found at least two cases where this does not work.

    Series: This does not seem to work for them at all (polars 1.15):

    >>> series = pl.Series("A", ["test", "test"])
    >>> series.filter(pl.lit(True))
    ...
    AttributeError: 'Expr' object has no attribute '_s'
    

    I assume this is the case because pl.Series.filter works only with a mask anyway. This works however:

    series.filter([True])
    

    it also works for a dataframe it seems, so if you want to use the same solution for both series and dataframes, this is the one.

    Null-column: When there's a column of type null (polars 1.12, seems to be fixed in 1.15 though):

    >>> null_frame = pl.DataFrame({"A": [None, None], "B": ["test", "test"})
    >>> null_frame.filter(pl.lit(True))
    ...
    polars.exceptions.ShapeError: filter's length: 1 differs from that of the series: 2