Search code examples
google-bigqueryfirebase-analytics

Mismatch of retention results from firebase and BigQuery


I calculated the retention i BigQuery with code bellow. Code was taken from here. But this code is giving me different retention then the retention already calculated in firebase. Number of users calculated in BigQuery is always smaller.

What is the difference between this two approaches? Is there a way to get the same result in BigQuery as it is in Firebase?

#standardSQL
####################################################################
# PART 1: Cohort of New Users starting on SEPT 1
####################################################################
WITH 
new_user_cohort AS (  
  SELECT DISTINCT user_pseudo_id as new_user_id
  FROM 
    `projectId.analytics_YOUR_TABLE.events_*`
  WHERE
   event_name = 'first_open' AND
   #geo.country = 'France' AND
   FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), DAY, "Etc/GMT+8")) = '20180901' AND
  _TABLE_SUFFIX BETWEEN '20180830' AND '20180902'),
num_new_users AS (
  SELECT count(*) as num_users_in_cohort  FROM new_user_cohort
),

####################################################################
# PART 2: Engaged users from Sept 1 cohort
####################################################################
engaged_user_by_day AS (  
  SELECT 
    FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), DAY, "Etc/GMT+8")) as event_day, COUNT (DISTINCT user_pseudo_id) as num_engaged_users
  FROM 
    `projectId.analytics_YOUR_TABLE.events_*` INNER JOIN new_user_cohort on new_user_id = user_pseudo_id 
  WHERE
    event_name = 'user_engagement' AND
    _TABLE_SUFFIX BETWEEN '20180830' AND '20180907'
  GROUP BY (event_day)
)

####################################################################
# PART 3: Daily Retention = [Engaged Users / Total Users]
####################################################################
SELECT event_day, num_engaged_users, num_users_in_cohort, ROUND((num_engaged_users / num_users_in_cohort), 3) as retention_rate
  FROM engaged_user_by_day CROSS JOIN num_new_users
ORDER BY (event_day)

Solution

  • I found out that that analytics is using sampling and in my report inside analytics it only uses 0.2% of the data.

    In firebase I noticed that they removed retention tab (at least in my case). But I think there sampling was also used.