Search code examples
pythonpandasdecimalparquet

How to change the decimal format from dot to comma when reading parquet files?


I'm working with parquet files and in order to read them I'm using pd.read_parquet(). However, the numerical values in the file are using commas and it is misunderstanding the numbers.

How can I change the decimal sign from dot to comma?

Here my piece of code:

new_col = pa.parquet.read_table(filepath).to_pandas()
aux = pd.concat([aux, new_col])

df.head()

                      X_Principal  Y_Principal  value_main  \
ts                                                                     
2016-01-27 15:15:00             1.0             4.0        11.020800   
2016-01-27 15:15:00             1.0             4.0        11.020800   
2016-01-27 15:15:00             1.0             4.0        36.408001   
2016-01-27 15:15:00             1.0             4.0        36.408001   
2016-01-27 15:30:00             1.0             4.0        12.004800 

type(new_col)

<class 'pandas.core.frame.DataFrame'>  

The number on the column value should be something like 110.20800, for example.


Solution

  • Let's do some minimal reproducible experiment.

    Let's prepare some data:

    In [1]: df = pd .DataFrame({"a":["1,1", "1,2"],"b":[1,2]})                                                         
    
    In [2]: df.to_parquet("./df.parquet", compression="GZIP") 
    

    Let's check what do we have indeed:

    18:48:29 delete$ parquet-cat df.parquet 
    a = 1,1
    b = 1
    
    a = 1,2
    b = 2
    

    Then, let's read the data and cast column of concern to float:

    In [8]: df1 = pd.read_parquet("./df.parquet")                                                                                          
    
    In [9]: df1                                                                                                                            
    Out[9]: 
         a  b
    0  1,1  1
    1  1,2  2
    
    In [10]: df1.a.str.replace(",",".").astype("float64")                                                                                  
    Out[10]: 
    0    1.1
    1    1.2
    Name: a, dtype: float64
    

    As you can see, it's working on a parquet file with comma decimals.

    PS

    The data you added to your question does not quite coincide with the question itself. I think you should investigate closer what you have in parquet file, with tools like parquet-tool and if it reads correctly.