Search code examples
google-bigquerylegacy-sql

BigQuery Error – UNIQUE_HEAP requires an int32 argument


Using legacy SQL, I am trying to use COUNT(DISTINCT field, n) in Google BigQuery. But I am get following error:

UNIQUE_HEAP requires an int32 argument which is greater than 0 (error code: invalidQuery)

Here is my query that I have used:

SELECT
    hits.page.pagePath AS Page,
    COUNT(DISTINCT CONCAT(fullVisitorId, INTEGER(visitId)), 1e6) AS UniquePageviews,
    COUNT(DISTINCT fullVisitorId, 1e6) as Users
FROM
    [xxxxxxxx.ga_sessions_20170101]
GROUP BY
    Page
ORDER BY
    UniquePageviews DESC
LIMIT
    20

BigQuery is not even showing line number of error therefore I am not sure which line is causing this error.

What could be possible cause of above error?


Solution

  • Don't use 1e6 in your COUNT(DISTINCT). Instead, use an actual INTEGER value for the 2nd parameter 'N' (default is 1000), or use EXACT_COUNT_DISTINCT() instead.

    COUNT(DISTINCT) documentation

    EXACT_COUNT_DISTINCT() documentation

    If you require greater accuracy from COUNT(DISTINCT), you can specify a second parameter, n, which gives the threshold below which exact results are guaranteed. By default, n is 1000, but if you give a larger n, you will get exact results for COUNT(DISTINCT) up to that value of n. However, giving larger values of n will reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.

    To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.