Search code examples
python-polars

Compute standard deviation for polars dataframe rows for set of columns


I would like to calculate the standard deviation of dataframe row for the columns 'foo' and 'bar'.

I am able to find min,max and mean but not std.

import polars as pl

df = pl.DataFrame(
    {
        "foo": [1, 2, 3],
        "bar": [6, 7, 8],
        "ham": ["a", "b", "c"],
    }
)

# there are _horizontal functions for sum, min, max

df = df.with_columns(
    pl.sum_horizontal('foo','bar')
      .round(2)
      .alias('sum')
)

however, there is no std_horizontal function.

df = df.with_columns(
    pl.std_horizontal('foo','bar')
      .round(2)
      .alias('std')
)

# AttributeError: module 'polars' has no attribute 'std_horizontal'

Is there any better method available to compute standard deviation in such scenario ?


Solution

  • Until a dedicated std_horizontal is added:

    Another way to get a "row" or "horizontal" context is using the List API

    df.with_columns(
       sum = pl.concat_list("foo", "bar").list.sum(),
       std = pl.concat_list("foo", "bar").list.std()
    )
    
    shape: (3, 5)
    ┌─────┬─────┬─────┬─────┬──────────┐
    │ foo ┆ bar ┆ ham ┆ sum ┆ std      │
    │ --- ┆ --- ┆ --- ┆ --- ┆ ---      │
    │ i64 ┆ i64 ┆ str ┆ i64 ┆ f64      │
    ╞═════╪═════╪═════╪═════╪══════════╡
    │ 1   ┆ 6   ┆ a   ┆ 7   ┆ 3.535534 │
    │ 2   ┆ 7   ┆ b   ┆ 9   ┆ 3.535534 │
    │ 3   ┆ 8   ┆ c   ┆ 11  ┆ 3.535534 │
    └─────┴─────┴─────┴─────┴──────────┘