Search code examples
sqlgoogle-bigquerylegacy-sql

Big Query - different number of users when using legacy and normal sql


I have written a query in Google Big Query and want to get the same number of users I see in Google Analytics. I used Legacy and Normal SQL and got 3 different users numbers while the sessions were the same. What did I do wrong, or does anyone have an explanation/solution for it? Every help is appreciated!

Normal SQL

SELECT COUNT(DISTINCT fullVisitorId) AS users, SUM(IF(totals.visits IS 
NULL,0,totals.visits)) AS sessions
FROM `XXX.XXX.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20181120' AND '20181120'

Legacy SQL

SELECT COUNT(DISTINCT fullVisitorId) AS users, SUM(IF(totals.visits IS 
NULL,0,totals.visits)) AS sessions
FROM TABLE_DATE_RANGE([XXX:XXX.ga_sessions_], TIMESTAMP('2018-11-20'), 
TIMESTAMP('2018-11-20'))

Solution

  • I think this warning from the documentation explains what is happening:

    In legacy SQL, COUNT(DISTINCT x) returns an approximate count. In standard SQL, it returns an exact count.

    StandardSQL has the correct number. You can test this by attempting to use EXACT_COUNT_DISTINCT() in legacy SQL.