Search code examples
pythonpython-polars

Python Polars and Pandas Fill Missing Data Forward Edges


I am familiar with fill missing data within Polars and Pandas as in the below quick example:

df = pl.DataFrame({"date": [dt.datetime(2023, 8, 2), dt.datetime(2023, 8, 3), dt.datetime(2023, 8, 4)], "b": ["test", "test", "test2"], "a": [1, None, 3]})

date b a
2023-08-02 00:00:00 test 1
2023-08-03 00:00:00 test null
2023-08-04 00:00:00 test2 3

df = df.with_columns(pl.col('a').fill_null(pl.lit(0)))

date b a
2023-08-02 00:00:00 test 1
2023-08-03 00:00:00 test 0
2023-08-04 00:00:00 test2 3

I know there are other ways than the above such as filling with the previous group value of the group; however, what I cannot seem to find is a means to fill edge data such as in the above.

So if the above was collected on August 10th then dates August 5th to 10th are missing. I would like to fill in those dates for each group with zeroes (0).

Desired results:

date b a
2023-08-02 00:00:00 test 1
2023-08-03 00:00:00 test 0
2023-08-04 00:00:00 test2 3
2023-08-05 00:00:00 test 0
2023-08-05 00:00:00 test2 0
2023-08-06 00:00:00 test 0
2023-08-06 00:00:00 test2 0
2023-08-07 00:00:00 test 0
2023-08-07 00:00:00 test2 0
2023-08-08 00:00:00 test 0
2023-08-08 00:00:00 test2 0
2023-08-09 00:00:00 test 0
2023-08-09 00:00:00 test2 0
2023-08-10 00:00:00 test 0
2023-08-10 00:00:00 test2 0

Solution

  • An outer join is probably best:

    df = pl.DataFrame(
        {
            "date": [date(2023, 8, 2), date(2023, 8, 3), date(2023, 8, 4)],
            "b": ["test", "test", "test2"],
            "a": [1, None, 3],
        }
    )
    
    df2 = pl.DataFrame(
        {
            "date": np.tile(
                pl.date_range(date(2023, 8, 2), date(2023, 8, 10), eager=True), 2
            ),
            'b': np.tile(['test', 'test2'], 9),
        }
    )
    
    df.join(df2, on=['date', 'b'], how='outer').fill_null(0)