Update: Polars now uses fastexcel as the default read_excel engine.
The given example now runs as expected.
I have an excel file with an extension .xlsx. and it has also a field called date_of_birth, this filed is having years from 1860,1870 to till 2000 as below
Here is a command that I used for loading an excel:
import tempfile
import polars as pl
from pathlib import Path
with tempfile.TemporaryDirectory() as tmp_dir:
filename = Path(tmp_dir) / "1.xlsx"
(pl.from_repr("""
┌────────────────┐
│ date_of_birth │
│ --- │
│ str │
╞════════════════╡
│ 1/11/1887 │
│ 1/11/1887 │
│ 1/11/1889 │
│ 1/12/1886 │
│ 1/13/1889 │
│ 1/15/1886 │
│ 1/17/1888 │
│ 1/18/1889 │
│ 1/18/1885 │
│ 1/2/1880 │
└────────────────┘
""")
.with_columns(pl.col.date_of_birth.str.to_date("%m/%d/%Y"))
.write_excel(filename))
df = pl.read_excel(filename)
print(df)
On running this it gives an error:
XlsxValueError: Error: potential invalid date format.
How to ignore/Fix this error while reading the file so that I would get the data as it is in data frame. Is there any work around for this?
Well, the error is due to strftime
function which does not support pre-1900 years.
Probably polars is using that and it causes the problem.
You may try not parsing the dates on polar function; so that you can read the CSV file (and dates stay as String). And when you need to parse the dates; just use strptime
like:
datetime.datetime.strptime("1800/04/10", "%Y/%m/%d")
Also, you may try to use with_columns
method of polars framework (I couldn't test it yet; will update after trying it):
df_pl = pl.read_excel("Data_Set_14_Data.xlsx").with_columns(
pl.col("<last_col_name>").str.to_date("%m/%d/%Y")
)