Search code examples

polars: compute row-wise quantile over DataFrame

I have some polars DataFrames over which I want to compute some row-wise statistics.

For some there is a .list.func function which exists (eg list.mean), however, for those which don't have a dedicated function I believe I must use list.eval.

For the following example data:

df = pl.DataFrame({
    'a': [1,10,1,.1,.1,     np.NAN],
    'b': [2, 8,1,.2, np.NAN,np.NAN],
    'c': [3, 6,2,.3,.2,     np.NAN],
    'd': [4, 4,3,.4, np.NAN,np.NAN],
    'e': [5, 2,3,.5,.3,     np.NAN],
}, strict=False)

I have managed to come up with the following expression.

It seems that list.eval returns a list (which I suppose is more generic) so I need to call .explode on the resulting 1-element list to get back a single value.

The resulting column takes the name of the first column, so I then need to call .alias to give it a more meaningful name.

Is this the recommended way of computing row-wise?


  • I would unpivot and join here. It should be faster than .list.eval plus it let's you more easily add other row wise aggregations. Note I've added q2,q3,q4 to the agg

                for q,x in {'q1':0.25,'q2':0.50, 'q3':0.75, 'q4':1}.items()
    shape: (6, 9)
    │ a    ┆ b   ┆ c   ┆ d   ┆ … ┆ q1  ┆ q2  ┆ q3  ┆ q4   │
    │ ---  ┆ --- ┆ --- ┆ --- ┆   ┆ --- ┆ --- ┆ --- ┆ ---  │
    │ f64  ┆ f64 ┆ f64 ┆ f64 ┆   ┆ f64 ┆ f64 ┆ f64 ┆ f64  │
    │ 1.0  ┆ 2.0 ┆ 3.0 ┆ 4.0 ┆ … ┆ 2.0 ┆ 3.0 ┆ 4.0 ┆ 5.0  │
    │ 10.0 ┆ 8.0 ┆ 6.0 ┆ 4.0 ┆ … ┆ 4.0 ┆ 6.0 ┆ 8.0 ┆ 10.0 │
    │ 1.0  ┆ 1.0 ┆ 2.0 ┆ 3.0 ┆ … ┆ 1.0 ┆ 2.0 ┆ 3.0 ┆ 3.0  │
    │ 0.1  ┆ 0.2 ┆ 0.3 ┆ 0.4 ┆ … ┆ 0.2 ┆ 0.3 ┆ 0.4 ┆ 0.5  │
    │ 0.1  ┆ NaN ┆ 0.2 ┆ NaN ┆ … ┆ 0.2 ┆ 0.3 ┆ NaN ┆ NaN  │
    │ NaN  ┆ NaN ┆ NaN ┆ NaN ┆ … ┆ NaN ┆ NaN ┆ NaN ┆ NaN  │

    I used the walrus operator to create _df so as to not have to invoke .with_row_index twice. If you prefer you can just do df=df.with_row_index('i') first instead.