I'm learning polars (as substitute of pandas) and I would reply some excel functions.
In particular average if over a rolling windows.
Let us suppose we have a column with positive and negative value, how can I create a new column with rolling average only if all the value in the column are positive?
import polars as pl
df = pl.DataFrame(
{
"Date": ["12/04/98", "19/04/98", "26/04/98", "03/05/98", "10/05/98", "17/05/98", "24/05/98", "31/05/98", "07/06/98"],
"Close": [15.46 ,15.09 ,16.13 ,15.13 ,14.47 ,14.78 ,15.20 ,15.07 ,12.59]
}
)
df = df.with_columns(
pl.col("Close").pct_change().alias("Close Returns")
)
This creates a data frame with the column "Close Returns" and the new column will be it's average on a fixed windows only if the are all positive.
And if I want to create a new column as result of quotient positive average over negative?
As example for a window of two elements, in the image below there is the first which is null and do nothing. First widows contains a positive and a negative so returns zero (I need 2 positive value) while last window contains two negative and the mean can be computed.
Here my solution but I'm not satisfied:
import polars as pl
dataset = pl.DataFrame(
{
"Date": ["12/04/98", "19/04/98", "26/04/98", "03/05/98", "10/05/98", "17/05/98", "24/05/98", "31/05/98", "07/06/98"],
"Close": [15.46 ,15.09 ,16.13 ,15.13 ,14.47 ,14.78 ,15.20 ,15.07 ,12.59]
}
)
q = dataset.lazy().with_columns(pl.col("Date").str.to_date())
df = q.collect()
df = df.with_columns(
pl.col("Close").pct_change().alias("Close Returns")
)
lag_vector = [2, 6, 7, 10, 12, 13]
for lag in lag_vector:
out = df.rolling(
index_column="Date", period=f"{lag}w"
).agg(
pl.col("Close Returns").filter(pl.col("Close Returns") >= 0).mean().alias("positive mean"),
pl.col("Close Returns").filter(pl.col("Close Returns") < 0).mean().alias("negative mean"),
).with_columns(
pl.col("negative mean", "positive mean").fill_null(0)
)
out = out.with_columns(
(pl.col("positive mean") / (pl.col("positive mean") - pl.col("negative mean"))).alias(f"{lag} lag mean"),
)
df = df.join(out.select("Date", f"{lag} lag mean"), left_on="Date", right_on="Date")
Edit: I've tweaked my answer to use the any
expression so that the non-negative windowed mean is calculated if any (rather than all) of the values in the window is non-negative. Likewise, for the negative windowed mean.
lag_vector = [1, 2, 3]
for lag in lag_vector:
out = (
df
.rolling(index_column="Date", period=f"{lag}w").agg(
pl.col('Close Returns').alias('Close Returns list'),
pl.when((pl.col("Close Returns") >= 0).any())
.then(pl.col('Close Returns').filter(pl.col("Close Returns") >= 0).mean())
.otherwise(0)
.alias("positive mean"),
pl.when((pl.col("Close Returns") < 0).any())
.then(pl.col('Close Returns').filter(pl.col("Close Returns") < 0).mean())
.otherwise(0)
.alias("negative mean"),
)
)
print(out)
Window size 1 week:
shape: (9, 4)
┌────────────┬────────────────────┬───────────────┬───────────────┐
│ Date ┆ Close Returns list ┆ positive mean ┆ negative mean │
│ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ list[f64] ┆ f64 ┆ f64 │
╞════════════╪════════════════════╪═══════════════╪═══════════════╡
│ 0098-04-12 ┆ [null] ┆ 0.0 ┆ 0.0 │
│ 0098-04-19 ┆ [-0.023933] ┆ 0.0 ┆ -0.023933 │
│ 0098-04-26 ┆ [0.06892] ┆ 0.06892 ┆ 0.0 │
│ 0098-05-03 ┆ [-0.061996] ┆ 0.0 ┆ -0.061996 │
│ 0098-05-10 ┆ [-0.043622] ┆ 0.0 ┆ -0.043622 │
│ 0098-05-17 ┆ [0.021424] ┆ 0.021424 ┆ 0.0 │
│ 0098-05-24 ┆ [0.028417] ┆ 0.028417 ┆ 0.0 │
│ 0098-05-31 ┆ [-0.008553] ┆ 0.0 ┆ -0.008553 │
│ 0098-06-07 ┆ [-0.164565] ┆ 0.0 ┆ -0.164565 │
└────────────┴────────────────────┴───────────────┴───────────────┘
Window size 2 weeks:
shape: (9, 4)
┌────────────┬────────────────────────┬───────────────┬───────────────┐
│ Date ┆ Close Returns list ┆ positive mean ┆ negative mean │
│ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ list[f64] ┆ f64 ┆ f64 │
╞════════════╪════════════════════════╪═══════════════╪═══════════════╡
│ 0098-04-12 ┆ [null] ┆ 0.0 ┆ 0.0 │
│ 0098-04-19 ┆ [null, -0.023933] ┆ 0.0 ┆ -0.023933 │
│ 0098-04-26 ┆ [-0.023933, 0.06892] ┆ 0.06892 ┆ -0.023933 │
│ 0098-05-03 ┆ [0.06892, -0.061996] ┆ 0.06892 ┆ -0.061996 │
│ 0098-05-10 ┆ [-0.061996, -0.043622] ┆ 0.0 ┆ -0.052809 │
│ 0098-05-17 ┆ [-0.043622, 0.021424] ┆ 0.021424 ┆ -0.043622 │
│ 0098-05-24 ┆ [0.021424, 0.028417] ┆ 0.02492 ┆ 0.0 │
│ 0098-05-31 ┆ [0.028417, -0.008553] ┆ 0.028417 ┆ -0.008553 │
│ 0098-06-07 ┆ [-0.008553, -0.164565] ┆ 0.0 ┆ -0.086559 │
└────────────┴────────────────────────┴───────────────┴───────────────┘
Window size 3 weeks:
shape: (9, 4)
┌────────────┬─────────────────────────────────┬───────────────┬───────────────┐
│ Date ┆ Close Returns list ┆ positive mean ┆ negative mean │
│ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ list[f64] ┆ f64 ┆ f64 │
╞════════════╪═════════════════════════════════╪═══════════════╪═══════════════╡
│ 0098-04-12 ┆ [null] ┆ 0.0 ┆ 0.0 │
│ 0098-04-19 ┆ [null, -0.023933] ┆ 0.0 ┆ -0.023933 │
│ 0098-04-26 ┆ [null, -0.023933, 0.06892] ┆ 0.06892 ┆ -0.023933 │
│ 0098-05-03 ┆ [-0.023933, 0.06892, -0.061996… ┆ 0.06892 ┆ -0.042965 │
│ 0098-05-10 ┆ [0.06892, -0.061996, -0.043622… ┆ 0.06892 ┆ -0.052809 │
│ 0098-05-17 ┆ [-0.061996, -0.043622, 0.02142… ┆ 0.021424 ┆ -0.052809 │
│ 0098-05-24 ┆ [-0.043622, 0.021424, 0.028417… ┆ 0.02492 ┆ -0.043622 │
│ 0098-05-31 ┆ [0.021424, 0.028417, -0.008553… ┆ 0.02492 ┆ -0.008553 │
│ 0098-06-07 ┆ [0.028417, -0.008553, -0.16456… ┆ 0.028417 ┆ -0.086559 │
└────────────┴─────────────────────────────────┴───────────────┴───────────────┘
Is this closer to what you are looking for?