Search code examples
pythonsimulationpython-polarsaccounting

Generate a running tally with Polars where the closing balance one month becomes the opening balance for the next month


I want to generate a simulation of a running tally over a period of time given an initial opening balance. By example, an imaginary business starts with $1000 and experiences a random profit and loss during any month. How much might this business have after 12 months? My issue is that I can generate the needed columns, but cannot figure out how to carry the simulated tally on to the next month without clobbering data in the current month.

I can create a Polars dataframe and populate this with the needed profit and loss columns and generate the the closing balance. Using stacked 'with_columns()', I created the needed columns for the opening and closing balances, and then generate some random profit and losses, then update the closing balance. But I cannot get this to carry forward.

import polars as pl
import datetime as dt
from dateutil.relativedelta import relativedelta
start_date = dt.date.today() + relativedelta(months=1, day=1)
df = pl.DataFrame(
    pl.date_range(start_date, start_date + relativedelta(months=5), '1mo').alias('date'),
)
beginning_balance = 1000.0
df = df.with_columns(
        pl.lit(beginning_balance).alias('beginning_balance'),
        pl.lit(beginning_balance).alias('closing_balance'),
).with_columns(
    [
        pl.when(pl.col('date') == start_date)
        .then(pl.col('beginning_balance'))
        .otherwise(pl.col('closing_balance').shift(1))
        .alias('beginning_balance'), 
        pl.Series([normalvariate(100, 80) for _ in range(len(df))]).round(2).alias('profit'),
        pl.Series([normalvariate(100, 75) for _ in range(len(df))]).round(2).alias('loss'),
    ]
).with_columns(
    (pl.col('beginning_balance') + pl.col('profit') - pl.col('loss')).alias('closing_balance'),
)
df

resulting in :

 shape: (6, 5)
date    beginning_balance   closing_balance     profit  loss
date    f64     f64     f64     f64
2023-04-01  1000.0  909.61  72.97   163.36
2023-05-01  1000.0  1072.29     212.84  140.55
2023-06-01  1000.0  974.54  82.75   108.21
2023-07-01  1000.0  824.08  -60.69  115.23
2023-08-01  1000.0  940.71  155.6   214.89
2023-09-01  1000.0  994.5   94.23   99.73

Note that the closing balance from the first month (909.61) was not carried to the next month, which remained at 1000.00, thus the running tally is not happening.

I can definitely use a loop (for or while) to walk through each row, but this thwarts the optimization and parallelization of Polars. The following would work, but is much slower (especially when I am running many simulations with much more detailed calculations).

current_tally = beginning_balance
for t in range(len(df)):
    beginning_balance = current_tally
    current_tally = beginning_balance + df[t, 'profit'] - df[t, 'loss']
    df[t, 'beginning_balance'] = beginning_balance
    df[t, 'closing_balance'] = current_tally

df
 shape: (6, 5)
date    beginning_balance   closing_balance     profit  loss
date    f64     f64     f64     f64
2023-04-01  1000.0  1063.13     115.49  52.36
2023-05-01  1063.13     1224.37     184.62  23.38
2023-06-01  1224.37     1258.02     114.68  81.03
2023-07-01  1258.02     1323.73     239.12  173.41
2023-08-01  1323.73     1341.33     102.18  84.58
2023-09-01  1341.33     1329.88     19.9    31.35

How would I do this within Polars?


Solution

  • Alternative solution, which I think is more efficient (since .cumulative_eval is doing a lot of additions multiple times):

    Starting dataframe with date and the random profit and loss:

    import polars as pl
    import datetime as dt
    from dateutil.relativedelta import relativedelta
    
    num_months = 6
    start_date = dt.date.today() + relativedelta(months=1, day=1)
    df = pl.DataFrame().with_columns([
        pl.date_range(start_date, start_date + relativedelta(months=num_months - 1), '1mo').alias('date'),
        pl.Series([normalvariate(100, 80) for _ in range(num_months)]).round(2).alias('profit'),
        pl.Series([normalvariate(100, 75) for _ in range(num_months)]).round(2).alias('loss'),
    ])
    

    Building the new columns based on columns profit and loss:

    beginning_balance = 1000.0
    df = df.with_columns(
        (beginning_balance + (pl.col('profit') - pl.col('loss')).cumsum())
        .alias('closing_balance')
    ).with_columns(
        pl.col('closing_balance').shift_and_fill(1, beginning_balance)
        .alias('beginning_balance')
    )
    

    The tally is calculated by using .cumsum() on the difference between profit and loss.