I calculated the retention i BigQuery with code bellow. Code was taken from here. But this code is giving me different retention then the retention already calculated in firebase. Number of users calculated in BigQuery is always smaller.
What is the difference between this two approaches? Is there a way to get the same result in BigQuery as it is in Firebase?
#standardSQL
####################################################################
# PART 1: Cohort of New Users starting on SEPT 1
####################################################################
WITH
new_user_cohort AS (
SELECT DISTINCT user_pseudo_id as new_user_id
FROM
`projectId.analytics_YOUR_TABLE.events_*`
WHERE
event_name = 'first_open' AND
#geo.country = 'France' AND
FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), DAY, "Etc/GMT+8")) = '20180901' AND
_TABLE_SUFFIX BETWEEN '20180830' AND '20180902'),
num_new_users AS (
SELECT count(*) as num_users_in_cohort FROM new_user_cohort
),
####################################################################
# PART 2: Engaged users from Sept 1 cohort
####################################################################
engaged_user_by_day AS (
SELECT
FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), DAY, "Etc/GMT+8")) as event_day, COUNT (DISTINCT user_pseudo_id) as num_engaged_users
FROM
`projectId.analytics_YOUR_TABLE.events_*` INNER JOIN new_user_cohort on new_user_id = user_pseudo_id
WHERE
event_name = 'user_engagement' AND
_TABLE_SUFFIX BETWEEN '20180830' AND '20180907'
GROUP BY (event_day)
)
####################################################################
# PART 3: Daily Retention = [Engaged Users / Total Users]
####################################################################
SELECT event_day, num_engaged_users, num_users_in_cohort, ROUND((num_engaged_users / num_users_in_cohort), 3) as retention_rate
FROM engaged_user_by_day CROSS JOIN num_new_users
ORDER BY (event_day)
I found out that that analytics is using sampling and in my report inside analytics it only uses 0.2% of the data.
In firebase I noticed that they removed retention tab (at least in my case). But I think there sampling was also used.