Is it possible to fill_forward only some columns when upsampling using Polars?
For example, would like to fill in the missing dates in sample dataframe (see code below). 'upsample' and 'forward_fill' works beautifully and is blazing fast with a much larger dataset. The output is as expected in the table below. No issues, all as expected.
However: Is it possible to exclude a column for the forward_fill, so for example it returns blanks in the utc_time
column instead of filling the time? I have tried listing the columns in select statement by replacing pl.all() with pl.col(...)
, but that just removes the column that is not listed.
import polars as pl
from datetime import datetime
df = pl.DataFrame(
{
'utc_created':pl.date_range(datetime(2021, 12, 16), datetime(2021, 12, 22, 0), interval="2d", eager=True),
'utc_time':['21:12:06','21:20:06','17:51:10','03:54:49'],
'sku':[9100000801,9100000801,9100000801,9100000801],
'old':[18,17,16,15],
'new':[17,16,15,14],
'alert_type':['Inventory','Inventory','Inventory','Inventory'],
'alert_level':['Info','Info','Info','Info']
}
)
df = (df.upsample(time_column='utc_created',every='1d', group_by='sku')
.select(pl.all().forward_fill())
)
Returns:
shape: (7, 7)
┌─────────────┬──────────┬────────────┬─────┬─────┬────────────┬─────────────┐
│ utc_created ┆ utc_time ┆ sku ┆ old ┆ new ┆ alert_type ┆ alert_level │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ str ┆ i64 ┆ i64 ┆ i64 ┆ str ┆ str │
╞═════════════╪══════════╪════════════╪═════╪═════╪════════════╪═════════════╡
│ 2021-12-16 ┆ 21:12:06 ┆ 9100000801 ┆ 18 ┆ 17 ┆ Inventory ┆ Info │
│ 2021-12-17 ┆ 21:12:06 ┆ 9100000801 ┆ 18 ┆ 17 ┆ Inventory ┆ Info │
│ 2021-12-18 ┆ 21:20:06 ┆ 9100000801 ┆ 17 ┆ 16 ┆ Inventory ┆ Info │
│ 2021-12-19 ┆ 21:20:06 ┆ 9100000801 ┆ 17 ┆ 16 ┆ Inventory ┆ Info │
│ 2021-12-20 ┆ 17:51:10 ┆ 9100000801 ┆ 16 ┆ 15 ┆ Inventory ┆ Info │
│ 2021-12-21 ┆ 17:51:10 ┆ 9100000801 ┆ 16 ┆ 15 ┆ Inventory ┆ Info │
│ 2021-12-22 ┆ 03:54:49 ┆ 9100000801 ┆ 15 ┆ 14 ┆ Inventory ┆ Info │
└─────────────┴──────────┴────────────┴─────┴─────┴────────────┴─────────────┘
You can use pl.exclude()
and .forward_fill()
e.g.
df = (
df.upsample(time_column="utc_created", every="1d", group_by="sku")
.with_columns(pl.exclude("utc_time").forward_fill())
)
shape: (7, 7)
┌─────────────┬──────────┬────────────┬─────┬─────┬────────────┬─────────────┐
│ utc_created ┆ utc_time ┆ sku ┆ old ┆ new ┆ alert_type ┆ alert_level │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ str ┆ i64 ┆ i64 ┆ i64 ┆ str ┆ str │
╞═════════════╪══════════╪════════════╪═════╪═════╪════════════╪═════════════╡
│ 2021-12-16 ┆ 21:12:06 ┆ 9100000801 ┆ 18 ┆ 17 ┆ Inventory ┆ Info │
│ 2021-12-17 ┆ null ┆ 9100000801 ┆ 18 ┆ 17 ┆ Inventory ┆ Info │
│ 2021-12-18 ┆ 21:20:06 ┆ 9100000801 ┆ 17 ┆ 16 ┆ Inventory ┆ Info │
│ 2021-12-19 ┆ null ┆ 9100000801 ┆ 17 ┆ 16 ┆ Inventory ┆ Info │
│ 2021-12-20 ┆ 17:51:10 ┆ 9100000801 ┆ 16 ┆ 15 ┆ Inventory ┆ Info │
│ 2021-12-21 ┆ null ┆ 9100000801 ┆ 16 ┆ 15 ┆ Inventory ┆ Info │
│ 2021-12-22 ┆ 03:54:49 ┆ 9100000801 ┆ 15 ┆ 14 ┆ Inventory ┆ Info │
└─────────────┴──────────┴────────────┴─────┴─────┴────────────┴─────────────┘