Search code examples
pythondataframepython-polars

How to count work days between date columns with Polars


enter image description here

Assume there are two columns name date and maturity_date in a Polars DataFrame, they are both the type pl.Date. If use pl.col("maturity_date") - pl.col("date") we could get the diff days.

But how to get the numdays of the workdays between date and maturity_date(not count saturday and sunday)?

Or even calculate diff days that use a given calendar like a trade date calendar of stock market which is different from a normal calendar.

Use pl.concat_list([pl.col("date"), pl.col("maturity_date")]).apply(func) to call a udf python func could work, but this would be slow.

[update]

data = pl.DataFrame( [ pl.Series("date", [datetime.date(2000, 1, 4), datetime.date(2000, 1, 4), datetime.date(2000, 1, 4), datetime.date(2000, 1, 4), datetime.date(2000, 1, 4)], dtype=pl.Date), pl.Series("maturity_date", [datetime.date(2000, 1, 17), datetime.date(2000, 2, 15), datetime.date(2000, 3, 15), datetime.date(2000, 4, 17), datetime.date(2000, 5, 15)], dtype=pl.Date), ] )

data.with_columns(
    pl.date_ranges(e.date, e.maturity_date, closed="both", eager=False)
    .list.eval(pl.element().dt.weekday() <= 5)
    .list.count_match(True)
    .alias("duration")
        
    # pl.concat_list(["date", "maturity_date"]).apply(lambda x: get_work_days(x[0], x[1]))
    # .alias("duration")
)

I use this date_ranges to count, but it seems only a little faster than apply


Solution

  • @Dean MacGregor's excellent answer using pure Polars wasn't quite performant enough for my use case. Using Numpy's built in busday_count function turned out to be much faster in my case and can easily be converted back to a polars with pl.from_numpy.

    nyse_holidays=pl.Series([
        date(2000,1,17),
        date(2000,2,21),
        date(2000,4,21),
        date(2000,5,29),
        date(2000,7,4),
        date(2000,9,4),
        date(2000,11,23),
        date(2000,12,25),
    ])
    df = pl.DataFrame(
        {
            "business_days": pl.from_numpy(np.busday_count(data["date"], data["maturity_date"]),["business_days"])["business_days"],
            "trade_days": pl.from_numpy(np.busday_count(data["date"], data["maturity_date"], holidays=nyse_holidays),["trade_days"])["trade_days"],
        }
    )
    shape: (5, 2)
    ┌───────────────┬────────────┐
    │ business_days ┆ trade_days │
    │ ---           ┆ ---        │
    │ i64           ┆ i64        │
    ╞═══════════════╪════════════╡
    │ 9             ┆ 9          │
    │ 30            ┆ 29         │
    │ 51            ┆ 49         │
    │ 74            ┆ 72         │
    │ 94            ┆ 91         │
    └───────────────┴────────────┘