Search code examples
google-bigqueryclustered-index

BigQuery: "Clustering encountered a key that is longer than"


When clustering my Wikipedia pageviews tables I got the error:

Clustering encountered a key that is longer than the maximum allowed limit of 1024 bytes.

Context: https://medium.com/google-cloud/bigquery-optimized-cluster-your-tables-65e2f684594b

(I'm clustering by

CREATE TABLE `fh-bigquery.wikipedia_v3.pageviews_2017`
PARTITION BY DATE(datehour)
CLUSTER BY wiki, title
...

)


Solution

  • When clustering tables BigQuery has a limit of 1KB for the keys.

    You can solve this for the example tables by changing your insertion code so it truncates any entry that's too long.

    For example, instead of:

    INSERT INTO `fh-bigquery.wikipedia_v3.pageviews_2018` (datehour, wiki, title, views)
    SELECT datehour, wiki, title, views
    

    truncate the potentially long titles with:

    INSERT INTO `fh-bigquery.wikipedia_v3.pageviews_2018` (datehour, wiki, title, views)
    SELECT datehour, wiki, SUBSTR(title, 0, 300) title, views
    

    If you continue to experience errors, note that some malformed strings might have a longer length than what SUBSTR() sees. Filter those out with:

    WHERE BYTE_LENGTH(title) < 300