Is there a possibility in polars to read in csv with german number formatting like it is possible in pandas.read_csv() with the parameters "decimal" and "thousands"
Currently, the Polars read_csv method does not expose those parameters.
However, there is an easy workaround to convert them. For example, with this csv, allow Polars to read the German-formatted numbers as utf8.
import polars as pl
my_csv = b"""col1\tcol2\tcol3
1.234,5\tabc\t1.234.567
9.876\tdef\t3,21
"""
df = pl.read_csv(my_csv, separator="\t")
print(df)
shape: (2, 3)
┌─────────┬──────┬───────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪══════╪═══════════╡
│ 1.234,5 ┆ abc ┆ 1.234.567 │
│ 9.876 ┆ def ┆ 3,21 │
└─────────┴──────┴───────────┘
From here, the conversion is just a few lines of code:
df = df.with_columns(
pl.col("col1", "col3")
.str.replace_all(r"\.", "")
.str.replace(",", ".")
.cast(pl.Float64) # or whatever datatype needed
)
print(df)
shape: (2, 3)
┌────────┬──────┬────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ f64 │
╞════════╪══════╪════════════╡
│ 1234.5 ┆ abc ┆ 1.234567e6 │
│ 9876.0 ┆ def ┆ 3.21 │
└────────┴──────┴────────────┘
Just be careful to apply this logic only to numbers encoded in German locale. It will mangle numbers formatted in other locales.