Using legacy SQL, I am trying to use COUNT(DISTINCT field, n)
in Google BigQuery. But I am get following error:
UNIQUE_HEAP requires an int32 argument which is greater than 0 (error code: invalidQuery)
Here is my query that I have used:
SELECT
hits.page.pagePath AS Page,
COUNT(DISTINCT CONCAT(fullVisitorId, INTEGER(visitId)), 1e6) AS UniquePageviews,
COUNT(DISTINCT fullVisitorId, 1e6) as Users
FROM
[xxxxxxxx.ga_sessions_20170101]
GROUP BY
Page
ORDER BY
UniquePageviews DESC
LIMIT
20
BigQuery is not even showing line number of error therefore I am not sure which line is causing this error.
What could be possible cause of above error?
Don't use 1e6
in your COUNT(DISTINCT)
. Instead, use an actual INTEGER value for the 2nd parameter 'N'
(default is 1000), or use EXACT_COUNT_DISTINCT()
instead.
EXACT_COUNT_DISTINCT() documentation
If you require greater accuracy from COUNT(DISTINCT), you can specify a second parameter, n, which gives the threshold below which exact results are guaranteed. By default, n is 1000, but if you give a larger n, you will get exact results for COUNT(DISTINCT) up to that value of n. However, giving larger values of n will reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.
To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.