Search code examples
sqlpartitioningamazon-athena

SQL time difference between two rows in a partition window


I have a table of analytics events and I'm trying to calculate the time difference between two rows, i.e. the time it takes a user between trying to start and actually starting.

My data looks like this:

# session type recordedAt
1 D4E77C feedbackProvided 2021-08-17T09:13:00.768+03:00
2 D4E77C feedbackProvided 2021-08-17T12:06:03.301+03:00
3 D4E77C feedbackProvided 2021-08-17T14:28:15.083+03:00
4 D4E77C feedbackProvided 2021-08-17T14:28:17.12+03:00
5 D4E77C buttonClicked 2021-08-17T14:28:18.383+03:00
6 D4E77C measurementStarted 2021-08-17T14:28:22.437+03:00
7 D4E77C buttonClicked 2021-08-17T14:28:23.572+03:00
8 D4E77C measurementCancelled 2021-08-17T14:28:23.573+03:00

These are just the rows for a given session, assume there are lots of sessions.

I'm trying to calculate the difference in recordedAt between the first feedbackProvided and the first measurementStarted. However, I only want the first feedbackProvided to be considered if it's within, say, 3 minutes of the measurementStarted. So in this case, we'd look at the difference between 1 and 6 but the time is >3 minutes. 2 and 6, time is >3 minutes. 3 and 6, time is ~7 seconds.

I've been looking at some partitioning for the first time and I'm close but I can't figure out the 3 minute maximum time difference.

Am I on the right lines here?

WITH firstFeedbackProvided AS (
  SELECT 
    session, type, recordedAt,
    ROW_NUMBER() over(partition by session order by recordedAt) rn
  FROM events
  WHERE type='feedbackProvided'
),
firstMeasurementStarted AS (
  SELECT 
    session, type, recordedAt,
    ROW_NUMBER() over(partition by session order by recordedAt) rn
  FROM events
  WHERE type='measurementStarted'
)
SELECT 
  *,
  date_diff('millisecond', t1.recordedAt, t2.recordedAt) as diff
FROM firstFeedbackProvided as t1
JOIN firstMeasurementStarted as t2 ON t1.session = t2.session
WHERE t1.rn = 1
AND t2.rn = 1

Solution

  • I think you are overcomplicating the problem. Do the following:

    1. Calculate when the first measurement occurred for each session.
    2. Filter the rows to only include feedback events before this within your timeframe.
    3. Aggregate

    In SQL, this looks like:

    select session,
           first_measurementStarted - min(recordedat) 
    from (select e.*,
                 min(case when type = 'measurementStarted' then recordedat end) over (partition by session) as first_measurementStarted
          from events e 
         ) e
    where recordedat > first_measurementStarted - interval '3' minute and
         type = 'feedbackProvided'
    group by session, first_measurementStarted;