Polars Selecting all columns without NaNs

I have a dataframe where a number of the columns only consists of NaNs. I am trying to select only the columns in the dataframe where all the values are not equal to NaNs using Polars.

I have tried seeing if I could use a similar syntax to how I would proceed in Pandas e.g.

df[df.columns[~df.isnull().all()]]

However the syntax doesn't translate. I also know that you can use pl.filter but this only filters rows and not columns based on the criteria's applied within the filter expression.

Solution

So this is basically subsetting columns with a boolean mask.

So first let's create some sample data:

import polars as pl
import numpy as np

df = pl.DataFrame({
    "a": [np.nan, np.nan, np.nan, np.nan],
    "b": [3.0, 4.0, np.nan, 5.0], 
    "c": [np.nan, np.nan, np.nan, np.nan]
})

Next we have to get if a column consists completely of NaN Values

df.select(pl.all().is_not_nan().any())

shape: (1, 3)
┌───────┬──────┬───────┐
│ a     ┆ b    ┆ c     │
│ ---   ┆ ---  ┆ ---   │
│ bool  ┆ bool ┆ bool  │
╞═══════╪══════╪═══════╡
│ false ┆ true ┆ false │
└───────┴──────┴───────┘

To get this DataFrame as iterable we use the row function

df.select(pl.all().is_not_nan().any()).row(0)
# (False, True, False)

This we can now use in the bracket notation

df[:, df.select(pl.all().is_not_nan().any()).row(0)]

shape: (4, 1)
┌─────┐
│ b   │
│ --- │
│ f64 │
╞═════╡
│ 3.0 │
│ 4.0 │
│ NaN │
│ 5.0 │
└─────┘

Since in general bracket notation is not recommended we can do this also with select: (for looking more concise we use the compress function from itertools)

df.select(col for col in df if col.is_not_nan().any())

shape: (4, 1)
┌─────┐
│ b   │
│ --- │
│ f64 │
╞═════╡
│ 3.0 │
│ 4.0 │
│ NaN │
│ 5.0 │
└─────┘