I am new to polars
and have been trying to convert just Year
column to date
column but failed to do so.
Data
import polars as pl
import pandas as pd
import datetime as dt
import plotly.express as px
# creating pandas df
df_gapminder = px.data.gapminder()
# creating polars df
pl_gapminder = pl.DataFrame(df_gapminder)
Converting to date column:
In pandas
df_gapminder['date'] = pd.to_datetime(df_gapminder.year, format="%Y")
df_gapminder.date.head()
############## output ###########
# 0 1952-01-01
# 1 1957-01-01
# 2 1962-01-01
# 3 1967-01-01
# 4 1972-01-01
# Name: date, dtype: datetime64[ns]
In polars
- getting error below
pl_gapminder.with_columns(pl.col('year').str.to_date(format='%Y').alias('date'))
Error SchemaError: invalid series dtype: expected String
, got i64
In Polars
- Getting Wrong Results below
pl_gapminder.with_columns(pl.col('year').cast(pl.Date).alias('date')).head()
Wrong Output (Instead of increasing Years
the days
are increasing in date
column):
country continent year lifeExp pop gdpPercap iso_alpha iso_num date
str str i64 f64 i64 f64 str i64 date
"Afghanistan" "Asia" 1952 28.801 8425333 779.445314 "AFG" 4 1975-05-07
"Afghanistan" "Asia" 1957 30.332 9240934 820.85303 "AFG" 4 1975-05-12
"Afghanistan" "Asia" 1962 31.997 10267083 853.10071 "AFG" 4 1975-05-17
"Afghanistan" "Asia" 1967 34.02 11537966 836.197138 "AFG" 4 1975-05-22
"Afghanistan" "Asia" 1972 36.088 13079460 739.981106 "AFG" 4 1975-05-27
you can cast to string, then parse to date (what pandas does for you, under the hood):
pl_gapminder = pl_gapminder.with_columns(
pl.col('year')
.cast(pl.String)
.str.to_date(format='%Y')
.cast(pl.Date)
.alias('date')
)
pl_gapminder["date"]
shape: (1704,)
Series: 'date' [date]
[
1952-01-01
1957-01-01
1962-01-01
1967-01-01
1972-01-01
1977-01-01
...