Search code examples
pythonsimulationpython-polarsaccounting

Generate a running tally with Polars where the closing balance one month becomes the opening balance for the next month


I want to generate a simulation of a running tally over a period of time given an initial opening balance. By example, an imaginary business starts with $1000 and experiences a random profit and loss during any month. How much might this business have after 12 months? My issue is that I can generate the needed columns, but cannot figure out how to carry the simulated tally on to the next month without clobbering data in the current month.

I can create a Polars dataframe and populate this with the needed profit and loss columns and generate the the closing balance. Using stacked 'with_columns()', I created the needed columns for the opening and closing balances, and then generate some random profit and losses, then update the closing balance. But I cannot get this to carry forward.

import polars as pl
import datetime as dt
from dateutil.relativedelta import relativedelta
from random import normalvariate

start_date = dt.date.today() + relativedelta(months=1, day=1)
df = pl.DataFrame(
    pl.date_range(start_date, start_date + relativedelta(months=5), '1mo', eager=True).alias('date'),
)
beginning_balance = 1000.0
df = df.with_columns(
    pl.lit(beginning_balance).alias('beginning_balance'),
    pl.lit(beginning_balance).alias('closing_balance'),
).with_columns(
    pl.when(pl.col('date') == start_date)
    .then(pl.col('beginning_balance'))
    .otherwise(pl.col('closing_balance').shift(1))
    .alias('beginning_balance'), 
    pl.Series([normalvariate(100, 80) for _ in range(len(df))]).round(2).alias('profit'),
    pl.Series([normalvariate(100, 75) for _ in range(len(df))]).round(2).alias('loss'),
).with_columns(
    (pl.col('beginning_balance') + pl.col('profit') - pl.col('loss')).alias('closing_balance'),
)
df

resulting in :

shape: (6, 5)
┌────────────┬───────────────────┬─────────────────┬────────┬────────┐
│ date       ┆ beginning_balance ┆ closing_balance ┆ profit ┆ loss   │
│ ---        ┆ ---               ┆ ---             ┆ ---    ┆ ---    │
│ date       ┆ f64               ┆ f64             ┆ f64    ┆ f64    │
╞════════════╪═══════════════════╪═════════════════╪════════╪════════╡
│ 2024-11-01 ┆ 1000.0            ┆ 934.95          ┆ -58.53 ┆ 6.52   │
│ 2024-12-01 ┆ 1000.0            ┆ 903.15          ┆ 69.02  ┆ 165.87 │
│ 2025-01-01 ┆ 1000.0            ┆ 1007.71         ┆ 111.21 ┆ 103.5  │
│ 2025-02-01 ┆ 1000.0            ┆ 1011.97         ┆ 209.43 ┆ 197.46 │
│ 2025-03-01 ┆ 1000.0            ┆ 998.85          ┆ 32.22  ┆ 33.37  │
│ 2025-04-01 ┆ 1000.0            ┆ 1151.32         ┆ 277.49 ┆ 126.17 │
└────────────┴───────────────────┴─────────────────┴────────┴────────┘

Note that the closing balance from the first month (909.61) was not carried to the next month, which remained at 1000.00, thus the running tally is not happening.

I can definitely use a loop (for or while) to walk through each row, but this thwarts the optimization and parallelization of Polars. The following would work, but is much slower (especially when I am running many simulations with much more detailed calculations).

current_tally = beginning_balance
for t in range(len(df)):
    beginning_balance = current_tally
    current_tally = beginning_balance + df[t, 'profit'] - df[t, 'loss']
    df[t, 'beginning_balance'] = beginning_balance
    df[t, 'closing_balance'] = current_tally

df
┌────────────┬───────────────────┬─────────────────┬────────┬────────┐
│ date       ┆ beginning_balance ┆ closing_balance ┆ profit ┆ loss   │
│ ---        ┆ ---               ┆ ---             ┆ ---    ┆ ---    │
│ date       ┆ f64               ┆ f64             ┆ f64    ┆ f64    │
╞════════════╪═══════════════════╪═════════════════╪════════╪════════╡
│ 2024-11-01 ┆ 1000.0            ┆ 934.95          ┆ -58.53 ┆ 6.52   │
│ 2024-12-01 ┆ 934.95            ┆ 838.1           ┆ 69.02  ┆ 165.87 │
│ 2025-01-01 ┆ 838.1             ┆ 845.81          ┆ 111.21 ┆ 103.5  │
│ 2025-02-01 ┆ 845.81            ┆ 857.78          ┆ 209.43 ┆ 197.46 │
│ 2025-03-01 ┆ 857.78            ┆ 856.63          ┆ 32.22  ┆ 33.37  │
│ 2025-04-01 ┆ 856.63            ┆ 1007.95         ┆ 277.49 ┆ 126.17 │
└────────────┴───────────────────┴─────────────────┴────────┴────────┘

How would I do this within Polars?


Solution

  • Alternative solution, which I think is more efficient (since .cumulative_eval is doing a lot of additions multiple times):

    Starting dataframe with date and the random profit and loss:

    import polars as pl
    import datetime as dt
    from dateutil.relativedelta import relativedelta
    
    num_months = 6
    start_date = dt.date.today() + relativedelta(months=1, day=1)
    df = pl.DataFrame().with_columns(
        pl.date_range(start_date, start_date + relativedelta(months=num_months - 1), '1mo').alias('date'),
        pl.Series([normalvariate(100, 80) for _ in range(num_months)]).round(2).alias('profit'),
        pl.Series([normalvariate(100, 75) for _ in range(num_months)]).round(2).alias('loss'),
    )
    

    Building the new columns based on columns profit and loss:

    beginning_balance = 1000.0
    df = df.with_columns(
        (beginning_balance + (pl.col('profit') - pl.col('loss')).cum_sum())
        .alias('closing_balance')
    ).with_columns(
        pl.col('closing_balance').shift(1, fill_value=beginning_balance)
        .alias('beginning_balance')
    )
    

    The tally is calculated by using .cum_sum() on the difference between profit and loss.