sql google-cloud-platform google-bigquery

Why does loading parquet files into Bigquery display gibberish values in the table?

When I load parquet files into Bigquery table, values stored are weird. It seems to be the encoding of BYTES fields or whatever else.

Here's the format of the create fields

So when I read the table with casted fields, I get the readable values.

I found the solution here

My question is why is BigQuery behaving like this?

Solution

According to this GCP documentation, there are some parquet data types that can be converted into multiple BigQuery data types. A workaround is to add the data type that you want to parse to BigQuery.

For example, to convert the Parquet INT32 data type to the BigQuery DATE data type, specify the following:

optional int32 date_col (DATE);

And another way is to add the schema to the bq load command:

bq load --source_format=PARQUET --noreplace --noautodetect --parquet_enum_as_string=true --decimal_target_types=STRING [project]:[dataset].[tables] gs://[bucket]/[file].parquet  Column_name:Data_type