Say I have a table like this
userId eventType timing
647 'jump' 32.7
123 'skip' 13.1
647 'skip' 24.4
433 'jump' 12.7
433 'skip' 53.6
647 'jump' 2.4
647 'jump' 64.4
123 'skip' 14.0
433 'jump' 4.3
123 'jump' 18.6
I would like to output a table with one row per userId with columns as userId, the minimum timing where eventType was 'skip' for that userId and the minimum timing where eventType was 'jump' for the same userId. Like this.
userID first_skip first_jump
647 24.4 2.4
123 13.1 18.6
433 53.6 4.3
I realise I can do it with joins.
#standardSQL
WITH `project.dataset.table` AS (
SELECT 647 userId, 'jump' eventType, 32.7 timing UNION ALL
SELECT 123, 'skip', 13.1 UNION ALL
SELECT 647, 'skip', 24.4 UNION ALL
SELECT 433, 'jump', 12.7 UNION ALL
SELECT 433, 'skip', 53.6 UNION ALL
SELECT 647, 'jump', 2.4 UNION ALL
SELECT 647, 'jump', 64.4 UNION ALL
SELECT 123, 'skip', 14.0 UNION ALL
SELECT 433, 'jump', 4.3 UNION ALL
SELECT 123, 'jump', 18.6
)
SELECT
raw.userID,
MIN(skips.timing) AS first_skip,
MIN(jumps.timing) AS first_jump,
FROM `project.dataset.table` AS raw
LEFT JOIN `project.dataset.table` AS skips ON raw.userId = skips.userId
LEFT JOIN `project.dataset.table` AS jumps ON raw.userId = jumps.userId
WHERE skips.eventType = 'skip' AND jumps.eventType = 'jump'
GROUP BY userId
However, my actual data is pretty big and there are a few more eventType categories meaning that the query takes for ever to process. I'm wondering if there is a nicer and more efficient way to do this that doesn't use joins. Maybe using window
or partition
?
Use conditional aggregation:
select userid,
min(case when eventtype = 'skip' then timing end) first_skip,
min(case when eventtype = 'jump' then timing end) first_jump
from mytable
group by userid