Search code examples
python-polars

Excel equivalent average if on moving window


I'm learning polars (as substitute of pandas) and I would reply some excel functions.

In particular average if over a rolling windows.

Let us suppose we have a column with positive and negative value, how can I create a new column with rolling average only if all the value in the column are positive?

import polars as pl

df = pl.DataFrame(
    {
        "Date": ["12/04/98", "19/04/98", "26/04/98", "03/05/98", "10/05/98", "17/05/98", "24/05/98", "31/05/98", "07/06/98"],
        "Close": [15.46 ,15.09 ,16.13 ,15.13 ,14.47 ,14.78 ,15.20 ,15.07 ,12.59]
    }
)

df = df.with_columns(
    pl.col("Close").pct_change().alias("Close Returns")
)

This creates a data frame with the column "Close Returns" and the new column will be it's average on a fixed windows only if the are all positive.

And if I want to create a new column as result of quotient positive average over negative?

As example for a window of two elements, in the image below there is the first which is null and do nothing. First widows contains a positive and a negative so returns zero (I need 2 positive value) while last window contains two negative and the mean can be computed.

enter image description here

Here my solution but I'm not satisfied:

import polars as pl

dataset = pl.DataFrame(
    {
        "Date": ["12/04/98", "19/04/98", "26/04/98", "03/05/98", "10/05/98", "17/05/98", "24/05/98", "31/05/98", "07/06/98"],
        "Close": [15.46 ,15.09 ,16.13 ,15.13 ,14.47 ,14.78 ,15.20 ,15.07 ,12.59]
    }
)

q = dataset.lazy().with_columns(pl.col("Date").str.to_date())

df = q.collect()
df = df.with_columns(
    pl.col("Close").pct_change().alias("Close Returns")
)

lag_vector = [2, 6, 7, 10, 12, 13]

for lag in lag_vector:
    out = df.rolling(
        index_column="Date", period=f"{lag}w"
    ).agg(
        pl.col("Close Returns").filter(pl.col("Close Returns") >= 0).mean().alias("positive mean"),
        pl.col("Close Returns").filter(pl.col("Close Returns") < 0).mean().alias("negative mean"),
    ).with_columns(
        pl.col("negative mean", "positive mean").fill_null(0)
    )
    out = out.with_columns(
        (pl.col("positive mean") / (pl.col("positive mean") - pl.col("negative mean"))).alias(f"{lag} lag mean"),
    )
    df = df.join(out.select("Date", f"{lag} lag mean"), left_on="Date", right_on="Date")

Solution

  • Edit: I've tweaked my answer to use the any expression so that the non-negative windowed mean is calculated if any (rather than all) of the values in the window is non-negative. Likewise, for the negative windowed mean.

    lag_vector = [1, 2, 3]
    for lag in lag_vector:
        out = (
            df
            .rolling(index_column="Date", period=f"{lag}w").agg(
                pl.col('Close Returns').alias('Close Returns list'),
                pl.when((pl.col("Close Returns") >= 0).any())
                .then(pl.col('Close Returns').filter(pl.col("Close Returns") >= 0).mean())
                .otherwise(0)
                .alias("positive mean"),
                pl.when((pl.col("Close Returns") < 0).any())
                .then(pl.col('Close Returns').filter(pl.col("Close Returns") < 0).mean())
                .otherwise(0)
                .alias("negative mean"),
            )
        )
    
        print(out)
    

    Window size 1 week:

    shape: (9, 4)
    ┌────────────┬────────────────────┬───────────────┬───────────────┐
    │ Date       ┆ Close Returns list ┆ positive mean ┆ negative mean │
    │ ---        ┆ ---                ┆ ---           ┆ ---           │
    │ date       ┆ list[f64]          ┆ f64           ┆ f64           │
    ╞════════════╪════════════════════╪═══════════════╪═══════════════╡
    │ 0098-04-12 ┆ [null]             ┆ 0.0           ┆ 0.0           │
    │ 0098-04-19 ┆ [-0.023933]        ┆ 0.0           ┆ -0.023933     │
    │ 0098-04-26 ┆ [0.06892]          ┆ 0.06892       ┆ 0.0           │
    │ 0098-05-03 ┆ [-0.061996]        ┆ 0.0           ┆ -0.061996     │
    │ 0098-05-10 ┆ [-0.043622]        ┆ 0.0           ┆ -0.043622     │
    │ 0098-05-17 ┆ [0.021424]         ┆ 0.021424      ┆ 0.0           │
    │ 0098-05-24 ┆ [0.028417]         ┆ 0.028417      ┆ 0.0           │
    │ 0098-05-31 ┆ [-0.008553]        ┆ 0.0           ┆ -0.008553     │
    │ 0098-06-07 ┆ [-0.164565]        ┆ 0.0           ┆ -0.164565     │
    └────────────┴────────────────────┴───────────────┴───────────────┘
    

    Window size 2 weeks:

    shape: (9, 4)
    ┌────────────┬────────────────────────┬───────────────┬───────────────┐
    │ Date       ┆ Close Returns list     ┆ positive mean ┆ negative mean │
    │ ---        ┆ ---                    ┆ ---           ┆ ---           │
    │ date       ┆ list[f64]              ┆ f64           ┆ f64           │
    ╞════════════╪════════════════════════╪═══════════════╪═══════════════╡
    │ 0098-04-12 ┆ [null]                 ┆ 0.0           ┆ 0.0           │
    │ 0098-04-19 ┆ [null, -0.023933]      ┆ 0.0           ┆ -0.023933     │
    │ 0098-04-26 ┆ [-0.023933, 0.06892]   ┆ 0.06892       ┆ -0.023933     │
    │ 0098-05-03 ┆ [0.06892, -0.061996]   ┆ 0.06892       ┆ -0.061996     │
    │ 0098-05-10 ┆ [-0.061996, -0.043622] ┆ 0.0           ┆ -0.052809     │
    │ 0098-05-17 ┆ [-0.043622, 0.021424]  ┆ 0.021424      ┆ -0.043622     │
    │ 0098-05-24 ┆ [0.021424, 0.028417]   ┆ 0.02492       ┆ 0.0           │
    │ 0098-05-31 ┆ [0.028417, -0.008553]  ┆ 0.028417      ┆ -0.008553     │
    │ 0098-06-07 ┆ [-0.008553, -0.164565] ┆ 0.0           ┆ -0.086559     │
    └────────────┴────────────────────────┴───────────────┴───────────────┘
    

    Window size 3 weeks:

    shape: (9, 4)
    ┌────────────┬─────────────────────────────────┬───────────────┬───────────────┐
    │ Date       ┆ Close Returns list              ┆ positive mean ┆ negative mean │
    │ ---        ┆ ---                             ┆ ---           ┆ ---           │
    │ date       ┆ list[f64]                       ┆ f64           ┆ f64           │
    ╞════════════╪═════════════════════════════════╪═══════════════╪═══════════════╡
    │ 0098-04-12 ┆ [null]                          ┆ 0.0           ┆ 0.0           │
    │ 0098-04-19 ┆ [null, -0.023933]               ┆ 0.0           ┆ -0.023933     │
    │ 0098-04-26 ┆ [null, -0.023933, 0.06892]      ┆ 0.06892       ┆ -0.023933     │
    │ 0098-05-03 ┆ [-0.023933, 0.06892, -0.061996… ┆ 0.06892       ┆ -0.042965     │
    │ 0098-05-10 ┆ [0.06892, -0.061996, -0.043622… ┆ 0.06892       ┆ -0.052809     │
    │ 0098-05-17 ┆ [-0.061996, -0.043622, 0.02142… ┆ 0.021424      ┆ -0.052809     │
    │ 0098-05-24 ┆ [-0.043622, 0.021424, 0.028417… ┆ 0.02492       ┆ -0.043622     │
    │ 0098-05-31 ┆ [0.021424, 0.028417, -0.008553… ┆ 0.02492       ┆ -0.008553     │
    │ 0098-06-07 ┆ [0.028417, -0.008553, -0.16456… ┆ 0.028417      ┆ -0.086559     │
    └────────────┴─────────────────────────────────┴───────────────┴───────────────┘
    

    Is this closer to what you are looking for?