Search code examples
google-bigquerygoogle-cloud-storage

export bigquery data to cloud storage, the integer field changes to string format but float format stays as numeric format


I have these bigquery fields

views   INTEGER NULLABLE    
comments    INTEGER NULLABLE    
shares  INTEGER NULLABLE    
watch_time_minutes  FLOAT   NULLABLE    

views comments shares watch_time_minutes 0 0 1 0.0

I exported this bigquery data to cloud storage (json format) through BigQuery Console page (simply "Export Table") , what I got is

{"views":"0","comments":"0","shares":"1","watch_time_minutes":0}

Notice in this json record, Integer fields changed to string value, "views":"0","comments":"0","shares":"1"

But the float field stays as numeric format "watch_time_minutes":0

The integer fields in Json should stay as numeric format as

"views":0,"comments":0,"shares":1

Otherwise, in the following process when I do the math operations on these supposed to be integer fields (but no display as string in json), I would have problems.

Any Advice?

Right now, one solution I have is I loop through (millions lines of) the storage file and cast all the integer fields back to the integer. But it beat the performance advantage that "simply export bigquery table to storage by one statement."

I had another ugly solution that is "redefine integer field as float" but this is really ugly and it would cause other problems later on. Such as I could not display views times as 0.0 time. etc.

Thanks!


Solution

  • This has been reported on the issue tracker a few different times, e.g. Json exports on BQ is exported incorrectly. From the last comment on that issue:

    The reason is that most JSON parsers support only 32-bit integers, while BQ integers have 64 bits. You can use the field types in the result to determine how to parse the JSON, or if you want the JSON to use numbers rather than strings, you can cast the columns in a query to FLOAT (legacy SQL) or FLOAT64 (standard SQL). Note that this option may lose precision, however.

    Alternatively, you can use Avro format which supports 64-bit integers.