I am trying to assign flag to each entry in the data based on successful retry within 1 second. Here is the sample data
date id event url code event_ts
2021-08-20 11 1629515037.0682 xyz 503 2021-08-20 20:03:57.068
2021-08-20 11 1629515037.1073 xyz 200 2021-08-20 20:03:57.107 -- successful retry within 1 sec
2021-08-20 12 1629515037.1866 abc 503 2021-08-20 20:03:57.187
2021-08-20 12 1629515037.1942 abc 503 2021-08-20 20:03:57.194
2021-08-20 12 1629515037.2037 abc 503 2021-08-20 20:03:57.204
2021-08-20 12 1629515037.2249 abc 503 2021-08-20 20:03:57.225
2021-08-20 12 1629515064.2427 abc 200 2021-08-20 20:04:24.243 -- successful retry within 1 sec
I want to create a new column retry where,
if code = 503, successful retry within 1 sec -> successful_retry
if code = 503, successful retry after 1 sec -> successful_retry_after_1_sec
if code = 503, no successful retry at all -> no_successful_retry
I am primarily a Python/Pandas person but need to solve this in presto. I tried using LEAD() but wan't able to write a solution with variable offset. Would appreciate pointers
Edit: Based on @Gordon Linoff's answer
SELECT
date,
id,
url,
event,
FROM_UNIXTIME(event) AS event_ts,
code,
(
CASE
WHEN code <= 399 THEN 'successful_response'
WHEN MIN(CASE WHEN code <= 399 THEN FROM_UNIXTIME(event) END) OVER (
PARTITION BY
date,
id,
url
ORDER BY
date,
id,
url,
event rows BETWEEN CURRENT ROW AND UNBOUNDED following
) <= FROM_UNIXTIME(event) + INTERVAL '1' SECOND THEN 'success_retry_within_1_sec'
WHEN MIN(CASE WHEN code <= 399 THEN FROM_UNIXTIME(event) END) OVER (
PARTITION BY
date,
id,
url
ORDER BY
date,
id,
url,
event rows BETWEEN CURRENT ROW AND UNBOUNDED following
) > FROM_UNIXTIME(event) + INTERVAL '1' SECOND THEN 'success_retry_after_1_sec'
ELSE 'No_successful_retry'
END
) AS successful_retry_flag
FROM t
If I assume that 200 is a successful retry, you can get the next successful retry using a cumulative min. The rest is date arithmetic to set the flag:
select t.*,
(case when min(case when code = 200 then event_ts end) over
(partition by id
order by event_ts
rows between current row and unbounded following
) < event_ts + interval '1' second
then 1 else 0
end) as successful_retry_flag
from t;