Search code examples
sqlgoogle-bigquerygoogle-datastream

Tables with unapplied upsert data cannot be queried through prefix


In BigQuery, I have several tables with the same name prefix. The suffix of each table is a month, written in format yyyy_mm like 2023_09.

These tables are created by Google Datastream, copied over from PostgreSQL partitioned table.

For the previous months I have used the following query to create a view to combine these tables

SELECT
  *,
  REPLACE(_TABLE_SUFFIX, '_', '-') AS month,
FROM
  `my_project_id.my_dataset.my_table_*`
WHERE
  REGEXP_CONTAINS(_TABLE_SUFFIX, '^[0-9]{4}_[0-9]{2}$')

It worked well, but recently I encountered a new error in BigQuery which stops me from querying the view.

Tables with unapplied upsert data cannot be queried through prefix. First table with upsert data is my_project_id:my_dataset.my_table_2022_09.

I don't know how to fix this or what workaround can be applied, except writing all tables explicitly in the query.


Solution

  • I've raised a Google Support Case and received an answer that this is a known issue, but the current error message is not very accurate. There will be a support for this feature.

    The suggested workaround is to use the ‘UNION’, the easiest way to query all tables.

    I created the view and added a fixed value column to each table. Filtering by this column reduces the scanned bytes.

    CREATE VIEW some.view as ( 
      SELECT *, '2023-01' as month, 
      FROM `my_project_id.my_dataset.my_table_2023_01`
      UNION ALL
      SELECT *, '2023-02' as month, 
      FROM `my_project_id.my_dataset.my_table_2023_02`
      UNION ALL
      SELECT *, '2023-03' as month, 
      FROM `my_project_id.my_dataset.my_table_2023_03`
    );
    
    SELECT * FROM some.view WHERE month = '2023-03'