Search code examples
pythondatetimedurationpython-polars

How to get the days between today and a polars date?


I'm having a bit of trouble with my python code. I originally wrote it using pandas, but I need something a bit faster, so I'm converting it to polars.

After reading the mongodb into polars dataframes with

race = pl.DataFrame(list(race_coll.find()))

and converting the 'date_of_race' column into pl.Date type using

race = race.with_columns(pl.col('date_of_race').str.strptime(pl.Date, format='%d %m %Y').cast(pl.Date))

The pandas code that worked was

days_between = (pd.to_datetime('today') - race.date_of_race.values[0]) // np.timedelta64(1,'D')

I have tried the following:

date = pl.DataFrame({"date_of_race": [1], "value": race['date_of_race']})
days_between = (pd.to_datetime('today').normalize() - days_between[0][0]) // np.timedelta64(1,'D')

TypeError: 'int' object is not subscriptable
days_between = (pd.to_datetime('today').normalize() - race['date_of_race']) // np.timedelta64(1,'D')

PanicException: cannot coerce datatypes: ComputeError(ErrString("failed to determine supertype of object and date"))

When I print the dates, I get the following:

pandas:
print(race.date_of_race.values[0])

2022-10-15T00:00:00.000000000


polars:
print(race['date_of_race'])

shape: (1,)
Series: 'date_of_race' [date]
[
    2022-10-15
]

Any help is appreciated


Solution

  • use a Python datetime object for the reference date, and .dt.total_days() to get the days difference. EX:

    import polars as pl
    import pandas as pd
    
    s = pl.Series([
        "2022-10-30T00:00:00",
        "2022-10-30T01:00:00",
        "2022-10-30T02:00:00",
        "2022-10-30T03:00:00",
        "2022-10-30T04:00:00",
        "2022-10-30T05:00:00",
    ]).cast(pl.Datetime)
    
    diff = pd.to_datetime('today').normalize().to_pydatetime() - s
    # could also use the datetime module's date class here via
    # datetime.today().date()
    
    print(diff)
    # Series: '' [duration[μs]]
    # [
    #   299d
    #   298d 23h
    #   298d 22h
    #   298d 21h
    #   298d 20h
    #   298d 19h
    # ]
    
    print(diff.dt.total_days())
    # Series: '' [i64]
    # [
    #   299
    #   298
    #   298
    #   298
    #   298
    #   298
    # ]