How do you express the identity expression in Polars?
By this I mean the expression idexpr
that when you do lf.filter(idexpr)
you get the entirety of lf
.
Similar to SELECT(*)
in SQL.
I'm resorting to a logical expression like
idexpr = (pl.col("a") == 0) | (pl.col("a") != 0)
According to Documentation, what's passed to filter
as a predicate
needs to be an "Expression(s) that evaluates to a boolean Series."
This you already know, since you are passing a logical expression to circumvent it. Easily enough, there's a very simple expression that always evaluates to true: pl.lit(True)
or just True
.
import polars as pl
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [24, 28, 23],
"bool": [True, False, True]
})
print(df.filter(pl.lit(True)))
This gives the output of:
shape: (3, 4)
┌─────┬─────────┬─────┬───────┐
│ id ┆ name ┆ age ┆ bool │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ bool │
╞═════╪═════════╪═════╪═══════╡
│ 1 ┆ Alice ┆ 24 ┆ true │
│ 2 ┆ Bob ┆ 28 ┆ false │
│ 3 ┆ Charlie ┆ 23 ┆ true │
└─────┴─────────┴─────┴───────┘
Edit:
Careful though, I found at least two cases where this does not work.
Series: This does not seem to work for them at all (polars 1.15):
>>> series = pl.Series("A", ["test", "test"])
>>> series.filter(pl.lit(True))
...
AttributeError: 'Expr' object has no attribute '_s'
I assume this is the case because pl.Series.filter
works only with a mask anyway. This works however:
series.filter([True])
it also works for a dataframe it seems, so if you want to use the same solution for both series and dataframes, this is the one.
Null-column: When there's a column of type null
(polars 1.12, seems to be fixed in 1.15 though):
>>> null_frame = pl.DataFrame({"A": [None, None], "B": ["test", "test"})
>>> null_frame.filter(pl.lit(True))
...
polars.exceptions.ShapeError: filter's length: 1 differs from that of the series: 2