Search code examples
pythonpython-polars

How to fix date format issues while reading xlsx files using polars?


Update: Polars now uses fastexcel as the default read_excel engine.

The given example now runs as expected.


I have an excel file with an extension .xlsx. and it has also a field called date_of_birth, this filed is having years from 1860,1870 to till 2000 as below

Here is a command that I used for loading an excel:

import tempfile
import polars as pl
from pathlib import Path

with tempfile.TemporaryDirectory() as tmp_dir:

    filename = Path(tmp_dir) / "1.xlsx"

    (pl.from_repr("""
    ┌────────────────┐
    │ date_of_birth  │
    │ ---            │
    │ str            │
    ╞════════════════╡
    │ 1/11/1887      │
    │ 1/11/1887      │
    │ 1/11/1889      │
    │ 1/12/1886      │
    │ 1/13/1889      │
    │ 1/15/1886      │
    │ 1/17/1888      │
    │ 1/18/1889      │
    │ 1/18/1885      │
    │ 1/2/1880       │    
    └────────────────┘
    """)
    .with_columns(pl.col.date_of_birth.str.to_date("%m/%d/%Y"))
    .write_excel(filename))

    df = pl.read_excel(filename)

    print(df)

On running this it gives an error:

XlsxValueError: Error: potential invalid date format.

How to ignore/Fix this error while reading the file so that I would get the data as it is in data frame. Is there any work around for this?

enter image description here


Solution

  • Well, the error is due to strftime function which does not support pre-1900 years.

    Probably polars is using that and it causes the problem.

    You may try not parsing the dates on polar function; so that you can read the CSV file (and dates stay as String). And when you need to parse the dates; just use strptime like:

    datetime.datetime.strptime("1800/04/10", "%Y/%m/%d")
    

    Also, you may try to use with_columns method of polars framework (I couldn't test it yet; will update after trying it):

    df_pl = pl.read_excel("Data_Set_14_Data.xlsx").with_columns(
        pl.col("<last_col_name>").str.to_date("%m/%d/%Y")
    )