Search code examples
google-cloud-platformgoogle-bigquerygoogle-cloud-storageavro

Google BigQuery support for Avro logicalTypes


As Google claims there is no support for conversion from Avro logicalType to BigQuery specific type (as described here on the bottom).

However I'm able to load Avro file with the following schema:

schema = {
    'name': 'test',
    'namespace': 'testing',
    'type': 'record',
    'fields': [
        {'name': 'test_timestamp', 'type': 'long', 'logicalType': 'timestamp-millis'},
    ],
}

onto BigQuery with column of type TIMESTAMP.

The situation is different with the following schema:

schema = {
    'name': 'test',
    'namespace': 'testing',
    'type': 'record',
    'fields': [
        {'name': 'testdate', 'type': 'int', 'logicalType': 'date'},
    ],
} 

and BigQuery table with column of type DATE. I was using bq load in the following way (in both cases):

bq --location=EU load --source_format=AVRO --project_id=test-project dataset.table "gs://bucket/test_file.avro"

and it failed with exception:

Field testdate has changed type from DATE to INTEGER

Is there any chance that logicalTypes will be supported by BigQuery or is there any elegant way to workaround such situation? (I'm aware of workaround where temporary table is used and then there is BQL select that casts TIMESTAMPS to DATES but it's not really pretty :P)


Solution

  • Native understanding for Avro Logical Types is now available publicly for all BigQuery users. Please refer to the documentation page here for more details: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types