When clustering my Wikipedia pageviews tables I got the error:
Clustering encountered a key that is longer than the maximum allowed limit of 1024 bytes.
Context: https://medium.com/google-cloud/bigquery-optimized-cluster-your-tables-65e2f684594b
(I'm clustering by
CREATE TABLE `fh-bigquery.wikipedia_v3.pageviews_2017`
PARTITION BY DATE(datehour)
CLUSTER BY wiki, title
...
)
When clustering tables BigQuery has a limit of 1KB for the keys.
You can solve this for the example tables by changing your insertion code so it truncates any entry that's too long.
For example, instead of:
INSERT INTO `fh-bigquery.wikipedia_v3.pageviews_2018` (datehour, wiki, title, views)
SELECT datehour, wiki, title, views
truncate the potentially long titles with:
INSERT INTO `fh-bigquery.wikipedia_v3.pageviews_2018` (datehour, wiki, title, views)
SELECT datehour, wiki, SUBSTR(title, 0, 300) title, views
If you continue to experience errors, note that some malformed strings might have a longer length than what SUBSTR()
sees. Filter those out with:
WHERE BYTE_LENGTH(title) < 300