Search code examples
correlationrolling-computationpython-polars

How to get rolling correlation on Python Polars?


How to get rolling correlation in Python Polars? Or at least correlation per row like:

pl.corr(pl.col('col_1'), pl.col('col_2'))

I am aware of Pandas solution:

pd_df = result_df.to_pandas()
rol_corr_df = pd_df['col_1'].rolling(5).corr(pd_df['col_2'])
pl_df = pl_df.with_columns(correlation=pl.from_pandas(rol_corr_df))

Solution

  • polars has rolling but it needs to be pointed to a time or integer column by which it'll group.

    If you just want it to groupby rows then you can use with_row_index to create an index.

    Assume we start with

    df=pl.DataFrame({'a':np.random.uniform(1,100,100), 'b':np.random.uniform(1,100,100), })
    

    then we could do the following:

    df \
        .with_row_index('i') \
        .rolling('i', period='10i') \
        .agg(rolling_corr=pl.corr('a','b')) \
        .drop('i')
    
    
    
    shape: (100, 1)
    ┌──────────────┐
    │ rolling_corr │
    │ ---          │
    │ f64          │
    ╞══════════════╡
    │ NaN          │
    │ 1.0          │
    │ -0.419386    │
    │ -0.322489    │
    │ …            │
    │ -0.333332    │
    │ -0.027533    │
    │ 0.081232     │
    │ 0.151985     │
    └──────────────┘
    

    Note in the rolling, the period is set to a string 10i. If you had a datetime then there are more options, see here