Search code examples
pythondataframetimedeltapython-polars

Add timedelta to a date column above weeks


How would I add 1 year to a column?

I've tried using map and apply but I failed miserably.

I also wonder why pl.date() accepts integers while it advertises that it only accepts str or pli.Expr.

A small hack workaround is:

col = pl.col('date').dt
df = df.with_columns(pl.when(pl.col(column).is_not_null())
                     .then(pl.date(col.year() + 1, col.month(), col.day()))
                     .otherwise(pl.date(col.year() + 1,col.month(), col.day()))
                     .alias("date"))

but this won't work for months or days. I can't just add a number or I'll get a:

> thread 'thread '<unnamed>' panicked at 'invalid or out-of-range date<unnamed>',
         ' panicked at '/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rsinvalid or out-of-range date:', 173:/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rs51
:note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Most likely because day and month cycle while year goes to infinity.

I could also do this:

df = df.with_columns(
        pl.when(col.month() == 1)
        .then(pl.date(col.year(), 2, col.day()))
        .when(col.month() == 2)
        .then(pl.date(col.year(), 3, col.day()))
        .when(col.month() == 3)
        .then(pl.date(col.year(), 4, col.day()))
        .when(col.month() == 4)
        .then(pl.date(col.year(), 5, col.day()))
        .when(col.month() == 5)
        .then(pl.date(col.year(), 6, col.day()))
        .when(col.month() == 6)
        .then(pl.date(col.year(), 7, col.day()))
        .when(col.month() == 7)
        .then(pl.date(col.year(), 8, col.day()))
        .when(col.month() == 8)
        .then(pl.date(col.year(), 9, col.day()))
        .when(col.month() == 9)
        .then(pl.date(col.year(), 10, col.day()))
        .when(col.month() == 10)
        .then(pl.date(col.year(), 11, col.day()))
        .when(col.month() == 11)
        .then(pl.date(col.year(), 12, col.day()))
        .otherwise(pl.date(col.year() + 1, 1, 1))
        .alias("valid_from")
    )

Solution

  • Polars allows to do addition and subtraction with python's timedelta objects. However above week units things get a bit more complicated as we have to take different days of the month and leap years into account.

    For this polars has offset_by under the dt namespace.

    (pl.DataFrame({
        "dates": pl.datetime_range(pl.datetime(2000, 1, 1), pl.datetime(2026, 1, 1), "1y", eager=True)
    }).with_columns(
        pl.col("dates").dt.offset_by("1y").alias("dates_and_1_yr")
    ))
    
    shape: (27, 2)
    ┌─────────────────────┬─────────────────────┐
    │ dates               ┆ dates_and_1_yr      │
    │ ---                 ┆ ---                 │
    │ datetime[μs]        ┆ datetime[μs]        │
    ╞═════════════════════╪═════════════════════╡
    │ 2000-01-01 00:00:00 ┆ 2001-01-01 00:00:00 │
    │ 2001-01-01 00:00:00 ┆ 2002-01-01 00:00:00 │
    │ 2002-01-01 00:00:00 ┆ 2003-01-01 00:00:00 │
    │ 2003-01-01 00:00:00 ┆ 2004-01-01 00:00:00 │
    │ 2004-01-01 00:00:00 ┆ 2005-01-01 00:00:00 │
    │ …                   ┆ …                   │
    │ 2022-01-01 00:00:00 ┆ 2023-01-01 00:00:00 │
    │ 2023-01-01 00:00:00 ┆ 2024-01-01 00:00:00 │
    │ 2024-01-01 00:00:00 ┆ 2025-01-01 00:00:00 │
    │ 2025-01-01 00:00:00 ┆ 2026-01-01 00:00:00 │
    │ 2026-01-01 00:00:00 ┆ 2027-01-01 00:00:00 │
    └─────────────────────┴─────────────────────┘