Search code examples
pythonpython-polars

polars.when(cond).then().otherwise() evaluates .then() when cond is false


Say I have this:

u = 2

df = (polars.DataFrame(dict(
  j=numpy.random.randint(10, 99, 10)
  ))
  .with_row_count()
  .with_columns(k=(polars.col('j') % 10) == u)
  .with_columns(l=polars.col('k').any())
  )
print(df)

print(df
  .with_columns(i=polars
    .when(polars.col('k').any())
    .then(polars.col('row_nr') > polars.col('row_nr').where(polars.col('k')).first())
    .otherwise(None)
    )
  )

This produces this:

 row_nr (u32)  j (i64)  k (bool)  l (bool)
 0             47       false     true
 1             22       true      true
 2             82       true      true
 3             19       false     true
 4             85       false     true
 5             15       false     true
 6             89       false     true
 7             74       false     true
 8             26       false     true
 9             11       false     true
shape: (10, 4)
 row_nr (u32)  j (i64)  k (bool)  l (bool)  i (bool)
 0             47       false     true      false
 1             22       true      true      false
 2             82       true      true      true
 3             19       false     true      true
 4             85       false     true      true
 5             15       false     true      true
 6             89       false     true      true
 7             74       false     true      true
 8             26       false     true      true
 9             11       false     true      true
shape: (10, 5)

However, when no conditions match - e.g. in the above with u = 0:

 row_nr (u32)  j (i64)  k (bool)  l (bool)
 0             47       false     false
 1             22       false     false
 2             82       false     false
 3             19       false     false
 4             85       false     false
 5             15       false     false
 6             89       false     false
 7             74       false     false
 8             26       false     false
 9             11       false     false
shape: (10, 4)

I get this exception:

exceptions.ComputeError: cannot evaluate two series of different lengths (10 and 0)

Error originated in expression: '[(col("row_nr")) > (col("row_nr").filter(col("k")).first())]'

I know I can check this beforehand and then do something else, but I was wondering:

  • Why doesn't polars.when().then().otherwise() work in this case, given that .then() should not even be evaluated in this case (since .when(polars.col('k').any()) is false)?
  • Is there a way to do this within one expression (without going "outside" of the expression, i.e. reaching for pure python if/else, using pipe and such)?

Solution

  • Why doesn't polars.when().then().otherwise() work in this case, given that .then() should not even be evaluated in this case (since .when(polars.col('k').any()) is false)?

    A polars when() clause executes the when(), then() and otherwise() portions in parallel. There is no short-circuit evaluation done.

    This behavior is now noted in the documentation

    Is there a way to do this within one expression (without going "outside" of the expression, i.e. reaching for pure python if/else, using pipe and such)?

    In this case, because first() is used on a potentially empty series, append(None) can be used to supply a default/otherwise value of None to be used in the greater than comparison. This will fill the i column with all nulls and will avoid the error.

    df.with_columns(
        i=polars.col("row_nr")
        > polars.col("row_nr")
            .where(polars.col("k"))
            .append(None)
            .first()
    )
    
    ┌────────┬─────┬───────┬───────┬──────┐
    │ row_nr ┆ j   ┆ k     ┆ l     ┆ i    │
    │ ---    ┆ --- ┆ ---   ┆ ---   ┆ ---  │
    │ u32    ┆ i64 ┆ bool  ┆ bool  ┆ bool │
    ╞════════╪═════╪═══════╪═══════╪══════╡
    │ 0      ┆ 47  ┆ false ┆ false ┆ null │
    │ 1      ┆ 22  ┆ false ┆ false ┆ null │
    │ 2      ┆ 82  ┆ false ┆ false ┆ null │
    │ 3      ┆ 19  ┆ false ┆ false ┆ null │
    │ 4      ┆ 85  ┆ false ┆ false ┆ null │
    │ 5      ┆ 15  ┆ false ┆ false ┆ null │
    │ 6      ┆ 89  ┆ false ┆ false ┆ null │
    │ 7      ┆ 74  ┆ false ┆ false ┆ null │
    │ 8      ┆ 26  ┆ false ┆ false ┆ null │
    │ 9      ┆ 11  ┆ false ┆ false ┆ null │
    └────────┴─────┴───────┴───────┴──────┘