polars: compute row-wise quantile over DataFrame

I have some polars DataFrames over which I want to compute some row-wise statistics.

For some there is a .list.func function which exists (eg list.mean), however, for those which don't have a dedicated function I believe I must use list.eval.

For the following example data:

df = pl.DataFrame({
    'a': [1,10,1,.1,.1,     np.NAN],
    'b': [2, 8,1,.2, np.NAN,np.NAN],
    'c': [3, 6,2,.3,.2,     np.NAN],
    'd': [4, 4,3,.4, np.NAN,np.NAN],
    'e': [5, 2,3,.5,.3,     np.NAN],
}, strict=False)

I have managed to come up with the following expression.

It seems that list.eval returns a list (which I suppose is more generic) so I need to call .explode on the resulting 1-element list to get back a single value.

The resulting column takes the name of the first column, so I then need to call .alias to give it a more meaningful name.

df.select(
    pl.concat_list(
        pl.all().fill_nan(None)
    )
    .list.eval(pl.element().quantile(0.25))
    .explode()
    .alias('q1')
)

Is this the recommended way of computing row-wise?

Solution

I would unpivot and join here. It should be faster than .list.eval plus it let's you more easily add other row wise aggregations. Note I've added q2,q3,q4 to the agg

(
    (_df:=df.with_row_index('i'))
    .join(
        _df
        .unpivot(index='i')
        .group_by('i')
        .agg(
            pl.col('value').quantile(x).alias(q)
            for q,x in {'q1':0.25,'q2':0.50, 'q3':0.75, 'q4':1}.items()
            ), 
        on='i'
        )
    .sort('i')
    .drop('i')
    )

shape: (6, 9)
┌──────┬─────┬─────┬─────┬───┬─────┬─────┬─────┬──────┐
│ a    ┆ b   ┆ c   ┆ d   ┆ … ┆ q1  ┆ q2  ┆ q3  ┆ q4   │
│ ---  ┆ --- ┆ --- ┆ --- ┆   ┆ --- ┆ --- ┆ --- ┆ ---  │
│ f64  ┆ f64 ┆ f64 ┆ f64 ┆   ┆ f64 ┆ f64 ┆ f64 ┆ f64  │
╞══════╪═════╪═════╪═════╪═══╪═════╪═════╪═════╪══════╡
│ 1.0  ┆ 2.0 ┆ 3.0 ┆ 4.0 ┆ … ┆ 2.0 ┆ 3.0 ┆ 4.0 ┆ 5.0  │
│ 10.0 ┆ 8.0 ┆ 6.0 ┆ 4.0 ┆ … ┆ 4.0 ┆ 6.0 ┆ 8.0 ┆ 10.0 │
│ 1.0  ┆ 1.0 ┆ 2.0 ┆ 3.0 ┆ … ┆ 1.0 ┆ 2.0 ┆ 3.0 ┆ 3.0  │
│ 0.1  ┆ 0.2 ┆ 0.3 ┆ 0.4 ┆ … ┆ 0.2 ┆ 0.3 ┆ 0.4 ┆ 0.5  │
│ 0.1  ┆ NaN ┆ 0.2 ┆ NaN ┆ … ┆ 0.2 ┆ 0.3 ┆ NaN ┆ NaN  │
│ NaN  ┆ NaN ┆ NaN ┆ NaN ┆ … ┆ NaN ┆ NaN ┆ NaN ┆ NaN  │
└──────┴─────┴─────┴─────┴───┴─────┴─────┴─────┴──────┘

I used the walrus operator to create _df so as to not have to invoke .with_row_index twice. If you prefer you can just do df=df.with_row_index('i') first instead.