Search code examples
google-bigquery

create partitioned table via bigquery query


I have an existing table that I wish to filter then add a partition. For a minimal example, consider the table toy below:

#The base table
CREATE OR REPLACE TABLE tmp.toy AS
(
  SELECT 1 as part_col, 1 as col2 UNION ALL
  SELECT 1 as part_col, 2 as col2 UNION ALL
  SELECT 2 as part_col, 3 as col2 UNION ALL
  SELECT 2 as part_col, 4 as col2 UNION ALL
  SELECT 2 as part_col, 5 as col2 
)

How can I run a query on this table to partition toy by part_col?


Things I have tried

  1. per the docs, I tried the following:
CREATE OR REPLACE TABLE tmp.part_example
PARTITION BY part_col
AS
(
  SELECT * 
  FROM tmp.toy
  WHERE col2 < 5
)
;

This yielded the following error: Query error: PARTITION BY expression must be DATE(<timestamp_column>), DATE(<datetime_column>), DATETIME_TRUNC(<datetime_column>, DAY/HOUR/MONTH/YEAR), a DATE column, TIMESTAMP_TRUNC(<timestamp_column>, DAY/HOUR/MONTH/YEAR), DATE_TRUNC(<date_column>, MONTH/YEAR), or RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<int64_value>, <int64_value>[, <int64_value>])) at [11:1]

  1. In light of this error I changed the second line to as follows: PARTITION BY INT64(part_col). This only yielded additional errors.

Solution

  • As per my comment mentioned earlier, to overcome the error PARTITION BY expression must be a RANGE_BUCKET to be able to generate the expected results as per your base table example. Following range partition example docmuentation can be checked for your reference.

    The query is as follows after making the changes:

    
    
    #The base table
    
    CREATE OR REPLACE TABLE tmp.toy AS
    (
    SELECT 1 as part_col, 1 as col2 UNION ALL
    SELECT 1 as part_col, 2 as col2 UNION ALL
    SELECT 2 as part_col, 3 as col2 UNION ALL
    SELECT 2 as part_col, 4 as col2 UNION ALL
    SELECT 2 as part_col, 5 as col2
    
    )
    
    
    
    
    
    CREATE OR REPLACE TABLE tmp.part_example
    PARTITION BY RANGE_BUCKET(part_col, GENERATE_ARRAY(0, 10, 1))
    AS
    (
    SELECT *
    FROM tmp.toy
    WHERE col2 < 5
    )
    ;