Search code examples
pythonpython-polars

How to apply rolling_map() in Python Polars for a function that uses multiple input columns


I have a function using Polars Expressions to calculate the standard deviation of the residuals from a linear regression (courtesy of this post).

Now I would like to apply this function using a rolling window over a dataframe. My approaches below fail because I don't know how to pass two columns as arguments to the function, since rolling_map() applies to an Expr.

Is there a way to do this directly in Polars, or do I need to use a workaround with Pandas? Thank you for your support! (feels like I'm missing something obvious here...)

import polars as pl


def ols_residuals_std(x: pl.Expr, y: pl.Expr) -> pl.Expr:
    # Calculate linear regression residuals and return the standard deviation thereof
    x_center = x - x.mean()
    y_center = y - y.mean()
    beta = x_center.dot(y_center) / x_center.pow(2).sum()
    e = y_center - beta * x_center
    return e.std()


df = pl.DataFrame({'a': [45, 76, 4, 88, 66, 5, 24, 72, 93, 87, 23, 40],
                   'b': [77, 11, 56, 43, 61, 25, 63, 7, 66, 17, 64, 75]})

# Applying the function over the full length - works
df = df.with_columns(ols_residuals_std(pl.col('a'), pl.col('b')).alias('e_std'))


df.with_columns(pl.col('a').rolling_map(ols_residuals_std(pl.col('a'), pl.col('b')), window_size=4, min_periods=1).alias('e_std_win'))
# PanicException: python function failed: PyErr { type: <class 'TypeError'>, value: TypeError("'Expr' object is not callable"), traceback: None }

df.with_columns(pl.col('a', 'b').rolling_map(ols_residuals_std(), window_size=4, min_periods=1).alias('e_std_win'))
# TypeError: ols_residuals_std() missing 2 required positional arguments: 'x' and 'y'

Solution

  • One thing to note about in rolling_map is that it is used for a custom function. While your expression is defined with a function, it isn't what they mean. What they mean is a python function which takes in values and outputs a value. This is also hinted at by the name having map which coincides to map_elements and map_batches. Additionally there's a Warning that it will be extremely slow which also hints at its expectation.

    To get at what you want to do, you can use rolling which unfortunately doesn't infer an index column so you have to manually create it.

    (
        df
        .with_row_index('i')
        .with_columns(
            ols_residuals_std(pl.col('a'), pl.col('b'))
            .rolling('i',period='4i').alias('e_std_win')
            )
        .drop('i')
    )
    shape: (12, 3)
    ┌─────┬─────┬───────────┐
    │ a   ┆ b   ┆ e_std_win │
    │ --- ┆ --- ┆ ---       │
    │ i64 ┆ i64 ┆ f64       │
    ╞═════╪═════╪═══════════╡
    │ 45  ┆ 77  ┆ 0.0       │
    │ 76  ┆ 11  ┆ 0.0       │
    │ 4   ┆ 56  ┆ 26.832826 │
    │ 88  ┆ 43  ┆ 23.440663 │
    │ …   ┆ …   ┆ …         │
    │ 93  ┆ 66  ┆ 28.72105  │
    │ 87  ┆ 17  ┆ 28.981351 │
    │ 23  ┆ 64  ┆ 29.063269 │
    │ 40  ┆ 75  ┆ 22.362099 │
    └─────┴─────┴───────────┘