Say I have this:
u = 2
df = (polars.DataFrame(dict(
j=numpy.random.randint(10, 99, 10)
))
.with_row_count()
.with_columns(k=(polars.col('j') % 10) == u)
.with_columns(l=polars.col('k').any())
)
print(df)
print(df
.with_columns(i=polars
.when(polars.col('k').any())
.then(polars.col('row_nr') > polars.col('row_nr').where(polars.col('k')).first())
.otherwise(None)
)
)
This produces this:
row_nr (u32) j (i64) k (bool) l (bool)
0 47 false true
1 22 true true
2 82 true true
3 19 false true
4 85 false true
5 15 false true
6 89 false true
7 74 false true
8 26 false true
9 11 false true
shape: (10, 4)
row_nr (u32) j (i64) k (bool) l (bool) i (bool)
0 47 false true false
1 22 true true false
2 82 true true true
3 19 false true true
4 85 false true true
5 15 false true true
6 89 false true true
7 74 false true true
8 26 false true true
9 11 false true true
shape: (10, 5)
However, when no conditions match - e.g. in the above with u = 0
:
row_nr (u32) j (i64) k (bool) l (bool)
0 47 false false
1 22 false false
2 82 false false
3 19 false false
4 85 false false
5 15 false false
6 89 false false
7 74 false false
8 26 false false
9 11 false false
shape: (10, 4)
I get this exception:
exceptions.ComputeError: cannot evaluate two series of different lengths (10 and 0)
Error originated in expression: '[(col("row_nr")) > (col("row_nr").filter(col("k")).first())]'
I know I can check this beforehand and then do something else, but I was wondering:
polars.when().then().otherwise()
work in this case, given that .then()
should not even be evaluated in this case (since .when(polars.col('k').any())
is false
)?if
/else
, using pipe
and such)?Why doesn't polars.when().then().otherwise() work in this case, given that .then() should not even be evaluated in this case (since .when(polars.col('k').any()) is false)?
A polars when()
clause executes the when()
, then()
and otherwise()
portions in parallel. There is no short-circuit evaluation done.
This behavior is now noted in the documentation
Is there a way to do this within one expression (without going "outside" of the expression, i.e. reaching for pure python if/else, using pipe and such)?
In this case, because first()
is used on a potentially empty series, append(None)
can be used to supply a default/otherwise value of None
to be used in the greater than comparison. This will fill the i
column with all nulls and will avoid the error.
df.with_columns(
i=polars.col("row_nr")
> polars.col("row_nr")
.where(polars.col("k"))
.append(None)
.first()
)
┌────────┬─────┬───────┬───────┬──────┐
│ row_nr ┆ j ┆ k ┆ l ┆ i │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ bool ┆ bool ┆ bool │
╞════════╪═════╪═══════╪═══════╪══════╡
│ 0 ┆ 47 ┆ false ┆ false ┆ null │
│ 1 ┆ 22 ┆ false ┆ false ┆ null │
│ 2 ┆ 82 ┆ false ┆ false ┆ null │
│ 3 ┆ 19 ┆ false ┆ false ┆ null │
│ 4 ┆ 85 ┆ false ┆ false ┆ null │
│ 5 ┆ 15 ┆ false ┆ false ┆ null │
│ 6 ┆ 89 ┆ false ┆ false ┆ null │
│ 7 ┆ 74 ┆ false ┆ false ┆ null │
│ 8 ┆ 26 ┆ false ┆ false ┆ null │
│ 9 ┆ 11 ┆ false ┆ false ┆ null │
└────────┴─────┴───────┴───────┴──────┘