I have a table of analytics events and I'm trying to calculate the time difference between two rows, i.e. the time it takes a user between trying to start and actually starting.
My data looks like this:
# | session | type | recordedAt |
---|---|---|---|
1 | D4E77C | feedbackProvided | 2021-08-17T09:13:00.768+03:00 |
2 | D4E77C | feedbackProvided | 2021-08-17T12:06:03.301+03:00 |
3 | D4E77C | feedbackProvided | 2021-08-17T14:28:15.083+03:00 |
4 | D4E77C | feedbackProvided | 2021-08-17T14:28:17.12+03:00 |
5 | D4E77C | buttonClicked | 2021-08-17T14:28:18.383+03:00 |
6 | D4E77C | measurementStarted | 2021-08-17T14:28:22.437+03:00 |
7 | D4E77C | buttonClicked | 2021-08-17T14:28:23.572+03:00 |
8 | D4E77C | measurementCancelled | 2021-08-17T14:28:23.573+03:00 |
These are just the rows for a given session, assume there are lots of sessions.
I'm trying to calculate the difference in recordedAt
between the first feedbackProvided and the first measurementStarted. However, I only want the first feedbackProvided to be considered if it's within, say, 3 minutes of the measurementStarted. So in this case, we'd look at the difference between 1 and 6 but the time is >3 minutes. 2 and 6, time is >3 minutes. 3 and 6, time is ~7 seconds.
I've been looking at some partitioning for the first time and I'm close but I can't figure out the 3 minute maximum time difference.
Am I on the right lines here?
WITH firstFeedbackProvided AS (
SELECT
session, type, recordedAt,
ROW_NUMBER() over(partition by session order by recordedAt) rn
FROM events
WHERE type='feedbackProvided'
),
firstMeasurementStarted AS (
SELECT
session, type, recordedAt,
ROW_NUMBER() over(partition by session order by recordedAt) rn
FROM events
WHERE type='measurementStarted'
)
SELECT
*,
date_diff('millisecond', t1.recordedAt, t2.recordedAt) as diff
FROM firstFeedbackProvided as t1
JOIN firstMeasurementStarted as t2 ON t1.session = t2.session
WHERE t1.rn = 1
AND t2.rn = 1
I think you are overcomplicating the problem. Do the following:
In SQL, this looks like:
select session,
first_measurementStarted - min(recordedat)
from (select e.*,
min(case when type = 'measurementStarted' then recordedat end) over (partition by session) as first_measurementStarted
from events e
) e
where recordedat > first_measurementStarted - interval '3' minute and
type = 'feedbackProvided'
group by session, first_measurementStarted;