Search code examples
google-bigquerydatabase-partitioning

Partition by week/month//quarter/year to get over the partition limit?


I have 32 years of data that I want to put into a partitioned table. However BigQuery says that I'm going over the limit (4000 partitions).

For a query like:

CREATE TABLE `deleting.day_partition`
PARTITION BY FlightDate 
AS 
SELECT *
FROM `flights.original` 

I'm getting an error like:

Too many partitions produced by query, allowed 2000, query produces at least 11384 partitions

How can I get over this limit?


Solution

  • Instead of partitioning by day, you could partition by week/month/year.

    In my case each year of data contains around ~3GB of data, so I'll get the most benefits from clustering if I partition by year.

    For this, I'll create a year date column, and partition by it:

    CREATE TABLE `fh-bigquery.flights.ontime_201903`
    PARTITION BY FlightDate_year
    CLUSTER BY Origin, Dest 
    AS
    SELECT *, DATE_TRUNC(FlightDate, YEAR) FlightDate_year
    FROM `fh-bigquery.flights.raw_load_fixed`
    

    Note that I created the extra column DATE_TRUNC(FlightDate, YEAR) AS FlightDate_year in the process.

    Table stats:

    enter image description here

    Since the table is clustered, I'll get the benefits of partitioning even if I don't use the partitioning column (year) as a filter:

    SELECT *
    FROM `fh-bigquery.flights.ontime_201903`
    WHERE FlightDate BETWEEN '2008-01-01' AND '2008-01-10'
    
    Predicted cost: 83.4 GB
    Actual cost: 3.2 GB