Search code examples
pythondatedatetimepython-polars

What is the polars equivalent of converting a numeric (Year) column to date in python?


I am new to polars and have been trying to convert just Year column to date column but failed to do so.

Data

import polars as pl
import pandas as pd
import datetime as dt
import plotly.express as px

# creating pandas df
df_gapminder = px.data.gapminder()

# creating polars df
pl_gapminder = pl.DataFrame(df_gapminder)

Converting to date column:

In pandas

df_gapminder['date'] = pd.to_datetime(df_gapminder.year, format="%Y")
df_gapminder.date.head()

############## output ###########
# 0   1952-01-01
# 1   1957-01-01
# 2   1962-01-01
# 3   1967-01-01
# 4   1972-01-01
# Name: date, dtype: datetime64[ns]

In polars - getting error below

pl_gapminder.with_columns(pl.col('year').str.to_date(format='%Y').alias('date'))

Error SchemaError: invalid series dtype: expected String, got i64

In Polars - Getting Wrong Results below

pl_gapminder.with_columns(pl.col('year').cast(pl.Date).alias('date')).head()

Wrong Output (Instead of increasing Years the days are increasing in date column):

country     continent   year    lifeExp pop      gdpPercap  iso_alpha   iso_num   date
str             str     i64      f64    i64        f64          str     i64       date
"Afghanistan"   "Asia"  1952    28.801  8425333     779.445314  "AFG"   4   1975-05-07
"Afghanistan"   "Asia"  1957    30.332  9240934     820.85303   "AFG"   4   1975-05-12
"Afghanistan"   "Asia"  1962    31.997  10267083    853.10071   "AFG"   4   1975-05-17
"Afghanistan"   "Asia"  1967    34.02   11537966    836.197138  "AFG"   4   1975-05-22
"Afghanistan"   "Asia"  1972    36.088  13079460    739.981106  "AFG"   4   1975-05-27

Solution

  • you can cast to string, then parse to date (what pandas does for you, under the hood):

    pl_gapminder = pl_gapminder.with_columns(
        pl.col('year')
        .cast(pl.String)
        .str.to_date(format='%Y')
        .cast(pl.Date)
        .alias('date')
    )
    
    
    pl_gapminder["date"]
    shape: (1704,)
    Series: 'date' [date]
    [
        1952-01-01
        1957-01-01
        1962-01-01
        1967-01-01
        1972-01-01
        1977-01-01
        ...