Search code examples
python-polars

Unable to convert string type to only date type format in polars python


I am trying to extract only dates from a string datetime column but not able to get proper results. I am not sure but can a single digit format in date create a problem?

data:

import polars as pl

df = pl.from_repr("""
┌──────┬──────────────────────┐
│ YEAR ┆ FL_DATE              │
│ ---  ┆ ---                  │
│ i64  ┆ str                  │
╞══════╪══════════════════════╡
│ 2023 ┆ 1/1/2023 12:00:00 AM │
│ 2023 ┆ 1/1/2023 12:00:00 AM │
│ 2023 ┆ 1/1/2023 12:00:00 AM │
└──────┴──────────────────────┘
""")

Failed attempts:

#1
df.select(
    pl.col('FL_DATE').alias('date_only').str.to_date("%m/%d/%Y")
)

#2
df.select(
    pl.col('FL_DATE').alias('date_only').str.to_date("%m/%d/%Y").dt.to_string("%Y-%m")
)

#3
df.select(
    pl.col('FL_DATE').alias('date_only').str.to_datetime("%m/%d/%Y",strict=False).cast(pl.Date)
)

#4
df.select(
    pl.col('FL_DATE').cast(pl.String).str.to_datetime("%m/%d/%Y",strict=False
                                                      ).cast(pl.Date).alias('date_only')
)

failed output:

1. InvalidOperationError: conversion from `str` to `date` failed in column 'date_only'
2. InvalidOperationError: conversion from `str` to `date` failed in column 'date_only'
3. null
4. null

Solution

  • The format you're supplying does not match your data - you're missing the timestamp, i.e. %r

    df.select(
       pl.col("FL_DATE").str.to_date("%m/%d/%Y %r")
    )
    
    shape: (3, 1)
    ┌────────────┐
    │ FL_DATE    │
    │ ---        │
    │ date       │
    ╞════════════╡
    │ 2023-01-01 │
    │ 2023-01-01 │
    │ 2023-01-01 │
    └────────────┘
    

    See the Format specification in the chrono docs. (which Polars uses internally)