Search code examples
pythoncsvpython-polars

polars.read_csv() with german number formatting


Is there a possibility in polars to read in csv with german number formatting like it is possible in pandas.read_csv() with the parameters "decimal" and "thousands"


Solution

  • Currently, the Polars read_csv method does not expose those parameters.

    However, there is an easy workaround to convert them. For example, with this csv, allow Polars to read the German-formatted numbers as utf8.

    import polars as pl
    
    my_csv = b"""col1\tcol2\tcol3
    1.234,5\tabc\t1.234.567
    9.876\tdef\t3,21
    """
    df = pl.read_csv(my_csv, separator="\t")
    print(df)
    
    shape: (2, 3)
    ┌─────────┬──────┬───────────┐
    │ col1    ┆ col2 ┆ col3      │
    │ ---     ┆ ---  ┆ ---       │
    │ str     ┆ str  ┆ str       │
    ╞═════════╪══════╪═══════════╡
    │ 1.234,5 ┆ abc  ┆ 1.234.567 │
    │ 9.876   ┆ def  ┆ 3,21      │
    └─────────┴──────┴───────────┘
    

    From here, the conversion is just a few lines of code:

    df = df.with_columns(
        pl.col("col1", "col3")
        .str.replace_all(r"\.", "")
        .str.replace(",", ".")
        .cast(pl.Float64)  # or whatever datatype needed
    )
    print(df)
    
    shape: (2, 3)
    ┌────────┬──────┬────────────┐
    │ col1   ┆ col2 ┆ col3       │
    │ ---    ┆ ---  ┆ ---        │
    │ f64    ┆ str  ┆ f64        │
    ╞════════╪══════╪════════════╡
    │ 1234.5 ┆ abc  ┆ 1.234567e6 │
    │ 9876.0 ┆ def  ┆ 3.21       │
    └────────┴──────┴────────────┘
    

    Just be careful to apply this logic only to numbers encoded in German locale. It will mangle numbers formatted in other locales.