I am trying to extract only dates
from a string datetime column
but not able to get proper results. I am not sure but can a single digit format in date create a problem?
data:
import polars as pl
df = pl.from_repr("""
┌──────┬──────────────────────┐
│ YEAR ┆ FL_DATE │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════════════════════╡
│ 2023 ┆ 1/1/2023 12:00:00 AM │
│ 2023 ┆ 1/1/2023 12:00:00 AM │
│ 2023 ┆ 1/1/2023 12:00:00 AM │
└──────┴──────────────────────┘
""")
Failed attempts:
#1
df.select(
pl.col('FL_DATE').alias('date_only').str.to_date("%m/%d/%Y")
)
#2
df.select(
pl.col('FL_DATE').alias('date_only').str.to_date("%m/%d/%Y").dt.to_string("%Y-%m")
)
#3
df.select(
pl.col('FL_DATE').alias('date_only').str.to_datetime("%m/%d/%Y",strict=False).cast(pl.Date)
)
#4
df.select(
pl.col('FL_DATE').cast(pl.String).str.to_datetime("%m/%d/%Y",strict=False
).cast(pl.Date).alias('date_only')
)
failed output:
1. InvalidOperationError: conversion from `str` to `date` failed in column 'date_only'
2. InvalidOperationError: conversion from `str` to `date` failed in column 'date_only'
3. null
4. null
The format you're supplying does not match your data - you're missing the timestamp, i.e. %r
df.select(
pl.col("FL_DATE").str.to_date("%m/%d/%Y %r")
)
shape: (3, 1)
┌────────────┐
│ FL_DATE │
│ --- │
│ date │
╞════════════╡
│ 2023-01-01 │
│ 2023-01-01 │
│ 2023-01-01 │
└────────────┘
See the Format specification in the chrono docs. (which Polars uses internally)