Search code examples

How to handle Money data type when writing to Parquet

I've been trying to get data from sql server, load to dataframe, and write to parquet (which later I loaded to BigQuery or other source). I've got some problem with money data type, for example when the data in sql server:


but after writing to parquet it converts to:


(because the data size is big, I can't download to my local to make sure, but maybe write.parquet change money to int, please correct me).

Here's part of my script:

df ="jdbc") \
    .option("url", "jdbc:sqlserver://{myIP}:1433;instanceName={myInstance};database={myDB};") \
    .option("dbtable", table_source) \
    .option("user", user_source) \
    .option("password", password_source) \
    .option("driver", "") \


Should I specify a scheme for each column? or are there some better approaches?


  • I believe this is because the , character is being treated as a decimal point. Can you confirm the data type in SQL server is numeric?

    If the type in SQL server is numeric, then you can try manually removing the , and casting to double or string before writing to parquet. If its not numeric then you will have to do the casting anyways.