Search code examples
sqlwindow-functionslagpresto

Presto/SQL find next value that meets a condition


I have a table looks like below, first 3 columns are in the original table and last 2 columns are what I want. So what I want are:

  1. find if within the next 30 days of a record if an event of "issued" happens
  2. if "issued" happens, what's the stamp of the "issued"

I was able to find if issued happened by replacing "issued" in the original table with 1 and used a self join and sum the "replaced issued" by sum(case when t2.event_ts between t1.event_ts and date_add('day',30,t1.event_ts) then t2.issued end) as issued_in_month.

But I'm not sure how to get the "issued" timestamp.

How can I do this in Presto? Thanks

| event_id                         | event_ts      | event_name | issued_within_30_days | issued_ts     |
|----------------------------------|---------------|------------|-----------------------|---------------|
| fc25df28c8bb188e8d15c1fea4acd06a | 6/1/18 21:31  | CLICK      | 1                     | 6/27/18 10:18 |
| 2b48d5ea80829cdd1014a68631160eb0 | 6/6/18 17:25  | OPEN       | 1                     | 6/27/18 10:18 |
| f4f6d9af7217031e5579c6d741fe3d21 | 6/8/18 0:00   | MAIL_DROP  | 1                     | 6/27/18 10:18 |
| 4a5af238e28688bf899def0905210550 | 6/10/18 4:25  | OPEN       | 1                     | 6/27/18 10:18 |
| a7945e7087c27d7f6bd7c66a397bf6bf | 6/25/18 17:53 | CLICK      | 1                     | 6/27/18 10:18 |
| 82eba7f62a9398347d67073291547765 | 6/27/18 4:58  | OPEN       | 1                     | 6/27/18 10:18 |
| 6d6984822e98009a0a30deda1204a5e0 | 6/27/18 4:58  | CLICK      | 1                     | 6/27/18 10:18 |
| 6229316a4ebcb56124e1180c9e2f60be | 6/27/18 10:18 | ISSUED     | 1                     | 6/27/18 10:18 |
| 3994ed57abc330e16965f13a2a37edd8 | 6/27/18 11:19 | SEND       | 0                     |               |
| 70f0a4d73975f26523415625309f1f64 | 7/4/18 17:30  | SEND       | 0                     |               |
| a7dd44648a0ce0029ad1ec7106fcf95c | 7/4/18 18:37  | OPEN       | 0                     |               |
| 8a823394754276a46fa3021ddc7a7290 | 7/6/18 0:00   | MAIL_DROP  | 0                     |               |
| 7a4c451554ba08da59a85a3fa61de7b6 | 7/6/18 20:18  | SEND       | 0                     |               |
| f992b7a5a25f765e5fd6f7dde129f9dc | 7/7/18 1:16   | OPEN       | 0                     |               |
| 9370e4901a4796371827e2f45862ba66 | 7/11/18 4:35  | CLICK      | 0                     |               |
| 96badd7253344f79c7ac2a47032d5942 | 2/22/19 17:20 | ISSUED     | 0                     |               |

Solution

  • The simplest method might be a cumulative min and some case expressions:

    select t.*,
           (case when next_issued_ts < event_ts + interval '30' day
                 then 1 else 0
            end) as issued_within_30_days,
           (case when next_issued_ts < event_ts + interval '30' day
                 then next_issued_ts
            end) as issued_ts
    from (select t.*,
                 min(case when event = 'issued' then event_ts end) over
                     (order by event_ts desc) as next_issued_ts
          from t
         ) t