Search code examples
sqlperformancegoogle-bigquerywindow-functionsself-join

Is there a way to change this BigQuery self-join to use a window function?


Let's say I have a BigQuery table "events" (in reality this is a slow sub-query) that stores the count of events per day, by event type. There are many types of events and most of them don't occur on most days, so there is only a row for day/event type combinations with a non-zero count.

I have a query that returns the count for each event type and day and the count for that event from N days ago, which looks like this:

WITH events AS (
  SELECT DATE('2019-06-08') AS day, 'a' AS type, 1 AS count
  UNION ALL SELECT '2019-06-09', 'a', 2
  UNION ALL SELECT '2019-06-10', 'a', 3
  UNION ALL SELECT '2019-06-07', 'b', 4
  UNION ALL SELECT '2019-06-09', 'b', 5
)
SELECT e1.type, e1.day, e1.count, COALESCE(e2.count, 0) AS prev_count
FROM events e1
LEFT JOIN events e2 ON e1.type = e2.type AND e1.day = DATE_ADD(e2.day, INTERVAL 2 DAY) -- LEFT JOIN, because the event may not have occurred at all 2 days ago
ORDER BY 1, 2

The query is slow. BigQuery best practices recommend using window functions instead of self-joins. Is there a way to do this here? I could use the LAG function if there was a row for each day, but there isn't. Can I "pad" it somehow? (There isn't a short list of possible event types. I could of course join to SELECT DISTINCT type FROM events, but that probably won't be faster than the self-join.)


Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    SELECT *, IFNULL(FIRST_VALUE(count) OVER (win), 0) prev_count
    FROM `project.dataset.events`
    WINDOW win AS (PARTITION BY type ORDER BY UNIX_DATE(day) RANGE BETWEEN 2 PRECEDING AND 2 PRECEDING)   
    

    If t apply to sample data from your question - result is:

    Row day         type    count   prev_count   
    1   2019-06-08  a       1       0    
    2   2019-06-09  a       2       0    
    3   2019-06-10  a       3       1    
    4   2019-06-07  b       4       0    
    5   2019-06-09  b       5       4