Search code examples
firebasegoogle-bigqueryfirebase-analyticshttp-status-code-400

Optimizing a Dynamic 7-Day Cohort Firebase BigQuery


I wrote the following query below against our mobile app's data. Due to a high user-base, I am getting a 400 request error "Resources exceeded during query execution: The query could not be executed in the allotted memory" when I add the ORDER BY at the bottom.

Question: Is there anything that I can do to optimize the query, but still retain the ORDER BY at the bottom?

I already added in the firebase's demo data-set, but I think their data-set is just too small to have a problem (compared to my data-set which is 5-10 million records big).

SELECT 
  f.user_pseudo_id,
  f.event_timestamp, 
  DATE(TIMESTAMP_MICROS(f.event_timestamp)) as event_timestamp_date,
  f.event_name,
  f.user_first_touch_timestamp,
  DATE(TIMESTAMP_MICROS(f.user_first_touch_timestamp)) as user_first_touch_date,
  CASE WHEN r.has_appRemove >= 1 THEN "removed" ELSE "not-removed" END AS status_after_first7days
FROM `firebase-analytics-sample-data.ios_dataset.app_events_*` f
LEFT JOIN (
    SELECT user_pseudo_id, 1 has_appRemove
    FROM `firebase-analytics-sample-data.ios_dataset.app_events_*`
    WHERE DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
      AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
      AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
      AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
      AND platform = "ANDROID"
      AND event_name = "app_remove"
    GROUP BY user_pseudo_id
    ) r on f.user_pseudo_id = r.user_pseudo_id
WHERE
  DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
  AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
  AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
  AND platform = "ANDROID" 
ORDER BY 1,2 ASC

Solution

  • You can apply windowing/analytical function instead of join'ing - like in below example (not tested)

    #standardSQL
    SELECT 
      user_pseudo_id,
      event_timestamp, 
      DATE(TIMESTAMP_MICROS(event_timestamp)) AS event_timestamp_date,
      event_name,
      user_first_touch_timestamp,
      DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_date,
      COUNTIF(event_name = "app_remove") OVER(PARTITION BY user_pseudo_id) > 0 isRemoved
    FROM `firebase-analytics-sample-data.ios_dataset.app_events_*` 
    WHERE
      DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
      AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
      AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
      AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
      AND platform = "ANDROID" 
    ORDER BY 1,2 ASC