Search code examples
pythondataframepython-polars

Filter polars dataframe on records where column values differ, catching nulls


Have:

import polars as pl
df = pl.DataFrame({'col1': [1,2,3], 'col2': [1, None, None]})

in polars dataframes, those Nones become nulls:

> df
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 1    ┆ 1    │
│ 2    ┆ null │
│ 3    ┆ null │
└──────┴──────┘

Want: some command that returns the last two rows of df, since 2 & 3 are not null

Tried:

..., but everything I've thought to try seems to drop/ignore records where one column is null:

  • df.filter(pl.col('col1')!=pl.col('col2')) # returns no rows
  • df.filter(~pl.col('col1')==pl.col('col2')) # returns no rows
  • df.filter(~pl.col('col1').eq(pl.col('col2'))) # returns no rows
  • ...

Solution

  • It is mentioned somewhat at the end of the .filter() docs.

    There are "missing" functions:

    df.filter(pl.col.col1.ne_missing(pl.col.col2))
    
    shape: (2, 2)
    ┌──────┬──────┐
    │ col1 ┆ col2 │
    │ ---  ┆ ---  │
    │ i64  ┆ i64  │
    ╞══════╪══════╡
    │ 2    ┆ null │
    │ 3    ┆ null │
    └──────┴──────┘