Search code examples
sqlgoogle-bigquerypivotmin

BigQuery - Select the minimum value of a column dependant upon the value in another column


Say I have a table like this

userId  eventType   timing
647     'jump'      32.7
123     'skip'      13.1
647     'skip'      24.4
433     'jump'      12.7 
433     'skip'      53.6 
647     'jump'      2.4 
647     'jump'      64.4 
123     'skip'      14.0
433     'jump'      4.3 
123     'jump'      18.6

I would like to output a table with one row per userId with columns as userId, the minimum timing where eventType was 'skip' for that userId and the minimum timing where eventType was 'jump' for the same userId. Like this.

userID  first_skip  first_jump  
647     24.4        2.4
123     13.1        18.6
433     53.6        4.3

I realise I can do it with joins.

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 647 userId, 'jump' eventType, 32.7 timing UNION ALL
  SELECT 123, 'skip', 13.1 UNION ALL
  SELECT 647, 'skip', 24.4 UNION ALL
  SELECT 433, 'jump', 12.7 UNION ALL
  SELECT 433, 'skip', 53.6 UNION ALL
  SELECT 647, 'jump', 2.4 UNION ALL
  SELECT 647, 'jump', 64.4 UNION ALL
  SELECT 123, 'skip', 14.0 UNION ALL
  SELECT 433, 'jump', 4.3 UNION ALL
  SELECT 123, 'jump', 18.6
)

SELECT 
  raw.userID, 
  MIN(skips.timing) AS first_skip,
  MIN(jumps.timing) AS first_jump,
FROM `project.dataset.table` AS raw
LEFT JOIN `project.dataset.table` AS skips ON raw.userId = skips.userId
LEFT JOIN `project.dataset.table` AS jumps ON raw.userId = jumps.userId
WHERE skips.eventType = 'skip' AND jumps.eventType = 'jump'
GROUP BY userId

However, my actual data is pretty big and there are a few more eventType categories meaning that the query takes for ever to process. I'm wondering if there is a nicer and more efficient way to do this that doesn't use joins. Maybe using window or partition?


Solution

  • Use conditional aggregation:

    select userid,
        min(case when eventtype = 'skip' then timing end) first_skip,
        min(case when eventtype = 'jump' then timing end) first_jump
    from mytable
    group by userid