I have 32 years of data that I want to put into a partitioned table. However BigQuery says that I'm going over the limit (4000 partitions).
For a query like:
CREATE TABLE `deleting.day_partition`
PARTITION BY FlightDate
AS
SELECT *
FROM `flights.original`
I'm getting an error like:
Too many partitions produced by query, allowed 2000, query produces at least 11384 partitions
How can I get over this limit?
Instead of partitioning by day, you could partition by week/month/year.
In my case each year of data contains around ~3GB of data, so I'll get the most benefits from clustering if I partition by year.
For this, I'll create a year
date column, and partition by it:
CREATE TABLE `fh-bigquery.flights.ontime_201903`
PARTITION BY FlightDate_year
CLUSTER BY Origin, Dest
AS
SELECT *, DATE_TRUNC(FlightDate, YEAR) FlightDate_year
FROM `fh-bigquery.flights.raw_load_fixed`
Note that I created the extra column DATE_TRUNC(FlightDate, YEAR) AS FlightDate_year
in the process.
Table stats:
Since the table is clustered, I'll get the benefits of partitioning even if I don't use the partitioning column (year) as a filter:
SELECT *
FROM `fh-bigquery.flights.ontime_201903`
WHERE FlightDate BETWEEN '2008-01-01' AND '2008-01-10'
Predicted cost: 83.4 GB
Actual cost: 3.2 GB