Search code examples
pythongoogle-bigquerycastinggoogle-cloud-dataflowapache-beam

BigQuery table schema types not being translated correctly in apache beam


There is a bug in the python Apache Beam SDK for BigQuery currently which translates BQ TIMESTAMP incorrectly to BQ DATETIME. This seems to have been fixed, but I have a feeling it may be in a pre-release not the latest stable release (2.49.0).

This appears in an error that describes an input/output schema mismatch when converted. This error only applies when using the Storage Write API. The legacy streaming API works fine.

The SDK converts LOGICAL_TYPE<beam:logical_type:micros_instant:v1> to DATETIME, not TIMESTAMP. I was wondering if anyone has found a workaround for now until the (relatively) new bug is fixed?


Solution

  • For anyone with the same issue, add this line before writing to BigQuery

    # imports
    from apache_beam.typehints.schemas import LogicalType, MillisInstant
    
    # logical type mapping
    LogicalType.register_logical_type(MillisInstant)
    

    The Apache Beam devs are aware of the issue and are working to find a more permanent solution.