I'm working with parquet files and in order to read them I'm using pd.read_parquet()
. However, the numerical values in the file are using commas and it is misunderstanding the numbers.
How can I change the decimal sign from dot to comma?
Here my piece of code:
new_col = pa.parquet.read_table(filepath).to_pandas()
aux = pd.concat([aux, new_col])
df.head()
X_Principal Y_Principal value_main \
ts
2016-01-27 15:15:00 1.0 4.0 11.020800
2016-01-27 15:15:00 1.0 4.0 11.020800
2016-01-27 15:15:00 1.0 4.0 36.408001
2016-01-27 15:15:00 1.0 4.0 36.408001
2016-01-27 15:30:00 1.0 4.0 12.004800
type(new_col)
<class 'pandas.core.frame.DataFrame'>
The number on the column value should be something like 110.20800, for example.
Let's do some minimal reproducible experiment.
Let's prepare some data:
In [1]: df = pd .DataFrame({"a":["1,1", "1,2"],"b":[1,2]})
In [2]: df.to_parquet("./df.parquet", compression="GZIP")
Let's check what do we have indeed:
18:48:29 delete$ parquet-cat df.parquet
a = 1,1
b = 1
a = 1,2
b = 2
Then, let's read the data and cast column of concern to float:
In [8]: df1 = pd.read_parquet("./df.parquet")
In [9]: df1
Out[9]:
a b
0 1,1 1
1 1,2 2
In [10]: df1.a.str.replace(",",".").astype("float64")
Out[10]:
0 1.1
1 1.2
Name: a, dtype: float64
As you can see, it's working on a parquet file with comma decimals.
PS
The data you added to your question does not quite coincide with the question itself. I think you should investigate closer what you have in parquet file, with tools like parquet-tool
and if it reads correctly.