Search code examples
sqlprestotrino

Assign flag if successful retry within 1 second


I am trying to assign flag to each entry in the data based on successful retry within 1 second. Here is the sample data

date       id  event            url  code  event_ts
2021-08-20 11  1629515037.0682  xyz  503   2021-08-20 20:03:57.068
2021-08-20 11  1629515037.1073  xyz  200   2021-08-20 20:03:57.107  -- successful retry within 1 sec
2021-08-20 12  1629515037.1866  abc  503   2021-08-20 20:03:57.187
2021-08-20 12  1629515037.1942  abc  503   2021-08-20 20:03:57.194
2021-08-20 12  1629515037.2037  abc  503   2021-08-20 20:03:57.204
2021-08-20 12  1629515037.2249  abc  503   2021-08-20 20:03:57.225
2021-08-20 12  1629515064.2427  abc  200   2021-08-20 20:04:24.243  -- successful retry within 1 sec

I want to create a new column retry where,

if code = 503, successful retry within 1 sec -> successful_retry
if code = 503, successful retry after 1 sec -> successful_retry_after_1_sec
if code = 503, no successful retry at all -> no_successful_retry

I am primarily a Python/Pandas person but need to solve this in presto. I tried using LEAD() but wan't able to write a solution with variable offset. Would appreciate pointers

Edit: Based on @Gordon Linoff's answer

SELECT
    date,
    id,
    url,
    event,
    FROM_UNIXTIME(event) AS event_ts,
    code,
    (
        CASE
            WHEN code <= 399 THEN 'successful_response'
            WHEN MIN(CASE WHEN code <= 399 THEN FROM_UNIXTIME(event) END) OVER (
                PARTITION BY
                    date,
                    id,
                    url
                ORDER BY
                    date,
                    id,
                    url,
                    event rows BETWEEN CURRENT ROW AND UNBOUNDED following
            ) <= FROM_UNIXTIME(event) + INTERVAL '1' SECOND THEN 'success_retry_within_1_sec'
            WHEN MIN(CASE WHEN code <= 399 THEN FROM_UNIXTIME(event) END) OVER (
                PARTITION BY
                    date,
                    id,
                    url
                ORDER BY
                    date,
                    id,
                    url,
                    event rows BETWEEN CURRENT ROW AND UNBOUNDED following
            ) > FROM_UNIXTIME(event) + INTERVAL '1' SECOND THEN 'success_retry_after_1_sec'
            ELSE 'No_successful_retry'
        END
    ) AS successful_retry_flag
FROM t

Solution

  • If I assume that 200 is a successful retry, you can get the next successful retry using a cumulative min. The rest is date arithmetic to set the flag:

    select t.*,
           (case when min(case when code = 200 then event_ts end) over
                          (partition by id
                           order by event_ts
                           rows between current row and unbounded following
                          ) < event_ts + interval '1' second
                 then 1 else 0
            end) as successful_retry_flag
    from t;