Search code examples
sqlimpalamemory-efficient

Efficiently find "isolated" rows in sql


Say we have a table with following schema:

| ID            | DATE          | VALUE |
| ------------- |:-------------:| -----:|
| 1             | '2016-08-01'  |  1600 |
| 9             | '2016-03-03'  |    12 |
| 1             | '2016-08-21'  |     1 |
| 4             | '2016-09-01'  |     1 |
| ....                ....         .... |

How to efficiently find all rows for which table doesn't contain records with same ID in N days after row's DATE?

Simply put, the goal is to find "last action" for each user (there were no actions from the user for at least N days after)


Solution

  • You could use the lead window function. This will perform better than using a correlated subquery:

    select *
    from   ( select id, date, value,
                    lead(date) over (partition by id order by date) next_date
             from   mytable ) as detail
    where  date < date_sub(next_date, 30) or  next_date is null
    

    This assumes your date field is a timestamp. If it is a string, then use datediff.

    Note that the next_date is null part ensures that you also get the most recent user record in the result set, as obviously it has no date following that is too soon after.