I have the following code
import polars as pl
import datetime as dt
from dateutil.relativedelta import relativedelta
def get_3_month_splits(product: str) -> list[str]:
front, start_dt, total_m = product.rsplit('.', 2)
start_dt = dt.datetime.strptime(start_dt, '%Y%m')
total_m = int(total_m)
return [f'{front}.{(start_dt+relativedelta(months=m)).strftime("%Y%m")}.3' for m in range(0, total_m, 3)]
df = pl.DataFrame({
'product': ['CHECK.GB.202403.12', 'CHECK.DE.202506.6', 'CASH.US.202509.12'],
'qty': [3, 6, -3],
'price': [100, 102, 95],
})
print(df)
df2 = pl.DataFrame([{'product_split': split} | d
for d in df.iter_rows(named=True) for split in get_3_month_splits(d['product'])
])
print(df2)
Essentially, I want to expand each row by splitting the product
into a more detailed product_split
columns, while keeping all other fields (like qty
, price
) the same.
The above accomplishes this, but is there a more native way to achieve this via with_columns
?
It looks like you want to create date ranges:
splits = pl.col("product").str.splitn(".", 4)
start_dt = splits.struct[-2].str.to_date("%Y%m")
total_m = splits.struct[-1]
(
df.with_columns(
pl.date_ranges(
start_dt,
start_dt.dt.offset_by(pl.format("{}mo", total_m.first())),
interval = "3mo",
closed="left"
)
.over(total_m)
.alias("date_ranges")
)
.explode("date_ranges")
.with_columns(
pl.format("{}.{}.{}.{}",
splits.struct[0],
splits.struct[1],
pl.col("date_ranges").dt.to_string("%Y%m"),
splits.struct[-1]
)
.alias("product_split")
)
)
shape: (10, 5)
┌────────────────────┬─────┬───────┬─────────────┬────────────────────┐
│ product ┆ qty ┆ price ┆ date_ranges ┆ product_split │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ date ┆ str │
╞════════════════════╪═════╪═══════╪═════════════╪════════════════════╡
│ CHECK.GB.202403.12 ┆ 3 ┆ 100 ┆ 2024-03-01 ┆ CHECK.GB.202403.12 │
│ CHECK.GB.202403.12 ┆ 3 ┆ 100 ┆ 2024-06-01 ┆ CHECK.GB.202406.12 │
│ CHECK.GB.202403.12 ┆ 3 ┆ 100 ┆ 2024-09-01 ┆ CHECK.GB.202409.12 │
│ CHECK.GB.202403.12 ┆ 3 ┆ 100 ┆ 2024-12-01 ┆ CHECK.GB.202412.12 │
│ CHECK.DE.202506.6 ┆ 6 ┆ 102 ┆ 2025-06-01 ┆ CHECK.DE.202506.6 │
│ CHECK.DE.202506.6 ┆ 6 ┆ 102 ┆ 2025-09-01 ┆ CHECK.DE.202509.6 │
│ CASH.US.202509.12 ┆ -3 ┆ 95 ┆ 2025-09-01 ┆ CASH.US.202509.12 │
│ CASH.US.202509.12 ┆ -3 ┆ 95 ┆ 2025-12-01 ┆ CASH.US.202512.12 │
│ CASH.US.202509.12 ┆ -3 ┆ 95 ┆ 2026-03-01 ┆ CASH.US.202603.12 │
│ CASH.US.202509.12 ┆ -3 ┆ 95 ┆ 2026-06-01 ┆ CASH.US.202606.12 │
└────────────────────┴─────┴───────┴─────────────┴────────────────────┘
The string processing can be done in several ways, it seems your data has a known number of fields, so we can use .str.splitn()
Finally, we .explode()
the list of dates produced by .date_ranges()