Search code examples
pythondatepython-polars

Polars YYYY week into a date


Does anyone know how to parse YYYY Week into a date column in Polars?
I have tried this code but it throws an error.

import polars as pl

pl.DataFrame({
    "week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col("week").cast(pl.String).str.to_date("%Y%U").alias("date"))
InvalidOperationError: conversion from `str` to `date` failed in column 'week' 
for 6 out of 6 values: ["201901", "201902", … "201944"]

Solution

  • This seems like a bug (although one with the underlying rust package chrono rather than polars itself). I tried using base python's strptime and it ignores the %U and just gives the first of the year for all cases so you can either do string manipulation and math like this (assuming you don't need an exact response)

    pl.DataFrame({
        "week": [201901, 201902, 201903, 201942, 201943, 201944]
    }) \
        .with_columns(pl.col('week').cast(pl.Utf8)) \
        .with_columns([pl.col('week').str.slice(0,4).cast(pl.Int32).alias('year'),
                       pl.col('week').str.slice(4,2).cast(pl.Int32).alias('week')]) \
        .select(pl.date(pl.col('year'),1,1) + pl.duration(days=(pl.col('week')-1)*7).alias('date'))
    

    If you look at the definition of %U, it's supposed to be based the xth Sunday of the year whereas my math is just multiplying by 7.

    Another approach is to make a df of dates, then make the strftime of them and then join the dfs. So that might be like this:

    dfdates=pl.DataFrame({'date':pl.date_range(datetime(2019,1,1), datetime(2019,12,31),'1d').cast(pl.Date())}) \
            .with_columns(pl.col('date').dt.strftime("%Y%U").alias('week')) \
            .groupby('week').agg(pl.col('date').min())
    

    And then joining it with what you have

    pl.DataFrame({
        "week": [201901, 201902, 201903, 201942, 201943, 201944]
    }).with_columns(pl.col('week').cast(pl.Utf8())).join(dfdates, on='week')
    
    shape: (6, 2)
    ┌────────┬────────────┐
    │ week   ┆ date       │
    │ ---    ┆ ---        │
    │ str    ┆ date       │
    ╞════════╪════════════╡
    │ 201903 ┆ 2019-01-20 │
    │ 201944 ┆ 2019-11-03 │
    │ 201902 ┆ 2019-01-13 │
    │ 201943 ┆ 2019-10-27 │
    │ 201942 ┆ 2019-10-20 │
    │ 201901 ┆ 2019-01-06 │
    └────────┴────────────┘