Search code examples
pythondataframepython-polars

Calculate cumulative sum after percent change in Polars?


import polars as pl

url = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"

df = pl.read_csv(url)

change = (
    df.with_columns(pl.col("sepal_width").shift(1, fill_value=0).pct_change(1).alias("pct_1"))
      .with_columns(
          pl.when(pl.col("pct_1").is_infinite())
            .then(float(0))
            .otherwise(pl.col("pct_1"))
            .fill_null(float(0))
            .name.keep(),
          pl.col("pct_1").cum_sum().alias("cumsum_pct_1")
      )
)

Given a dataset, I want to calculate cumsum after using pct_change. But the results are inf even after fill. Have searched for a while. Someone helps me, please.

Now:

┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species   ┆ pct_1     ┆ cumsum_pct_1 │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       ┆ ---       ┆ ---          │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       ┆ f64       ┆ f64          │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ null         │
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ inf          │
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa    ┆ -0.142857 ┆ inf          │
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa    ┆ 0.066667  ┆ inf          │

Expected:

┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species   ┆ pct_1     ┆ cumsum_pct_1 │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       ┆ ---       ┆ ---          │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       ┆ f64       ┆ f64          │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ 0.0          │
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ 0.0          │
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa    ┆ -0.142857 ┆ -0.142857    │
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa    ┆ 0.066667  ┆ -0.076190    │


Solution

  • I think the problem you have is a small misunderstanding how expression work inside a context ("with_columns"). All expression inside a context run in parallel so changes of one column aren't visible to the other expressions. So to solve your problem you have to do it step wise

    change = (
        df.with_columns(
            (pl.col("sepal_width").shift(1, fill_value=0).pct_change(1).alias("pct_1"))
        )
        .with_columns(
            pl.when(pl.col("pct_1").is_infinite())
            .then(float(0))
            .otherwise(pl.col("pct_1"))
            .fill_null(float(0))
            .name.keep()
        )
        .with_columns(pl.col("pct_1").cum_sum().alias("cumsum_pct_1"))
    )
    
    shape: (150, 7)
    ┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
    │ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species   ┆ pct_1     ┆ cumsum_pct_1 │
    │ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       ┆ ---       ┆ ---          │
    │ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       ┆ f64       ┆ f64          │
    ╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
    │ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ 0.0          │
    │ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ 0.0          │
    │ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa    ┆ -0.142857 ┆ -0.142857    │
    │ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa    ┆ 0.066667  ┆ -0.07619     │
    │ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ -0.03125  ┆ -0.10744     │
    │ …            ┆ …           ┆ …            ┆ …           ┆ …         ┆ …         ┆ …            │
    │ 6.7          ┆ 3.0         ┆ 5.2          ┆ 2.3         ┆ virginica ┆ 0.03125   ┆ 1.690664     │
    │ 6.3          ┆ 2.5         ┆ 5.0          ┆ 1.9         ┆ virginica ┆ -0.090909 ┆ 1.599755     │
    │ 6.5          ┆ 3.0         ┆ 5.2          ┆ 2.0         ┆ virginica ┆ -0.166667 ┆ 1.433088     │
    │ 6.2          ┆ 3.4         ┆ 5.4          ┆ 2.3         ┆ virginica ┆ 0.2       ┆ 1.633088     │
    │ 5.9          ┆ 3.0         ┆ 5.1          ┆ 1.8         ┆ virginica ┆ 0.133333  ┆ 1.766421     │
    └──────────────┴─────────────┴──────────────┴─────────────┴───────────┴───────────┴──────────────┘
    

    see also this question