Search code examples
sqlvertica

Vertica Analytic function to count instances in a window


Let's say I have a dataset with two columns: ID and timestamp. My goal is to count return IDs that have at least n timestamps in any 30 day window.

Here is an example:

ID Timestamp
1  '2019-01-01'
2  '2019-02-01'
3  '2019-03-01'
1  '2019-01-02'
1  '2019-01-04'
1  '2019-01-17'

So, let's say I want to return a list of IDs that have 3 timestamps in any 30 day window.

Given above, my resultset would just be ID = 1. I'm thinking some kind of windowing function would accomplish this, but I'm not positive.

Any chance you could help me write a query that accomplishes this?


Solution

  • A relatively simple way to do this involves lag()/lead():

    select t.*
    from (select t.*,
                 lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
          from t
         ) t
    where datediff(day, timestamp, timestamp_2) <= 30;
    

    The lag() looks at the third timestamp in a series. The where checks if this is within 30 days of the original one. The result is rows where this occurs.

    If you just want the ids, then:

    select distinct id
    from (select t.*,
                 lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
          from t
         ) t
    where datediff(day, timestamp, timestamp_2) <= 30;