Search code examples
sqlsql-serverdatabasewindow-functionsgaps-and-islands

Get users who took ride for 3 or more consecutive dates


I have below table, it shows user_id and ride_date.

+---------+------------+
| user_id | ride_date  |
+---------+------------+
|       1 | 2019-11-01 |
|       1 | 2019-11-03 |
|       1 | 2019-11-05 |
|       2 | 2019-11-03 |
|       2 | 2019-11-04 |
|       2 | 2019-11-05 |
|       2 | 2019-11-06 |
|       3 | 2019-11-03 |
|       3 | 2019-11-04 |
|       3 | 2019-11-05 |
|       3 | 2019-11-06 |
|       4 | 2019-11-05 |
|       4 | 2019-11-07 |
|       4 | 2019-11-08 |
|       4 | 2019-11-09 |
|       5 | 2019-11-11 |
|       5 | 2019-11-13 |
+---------+------------+

I want user_id who took rides for 3 or more consecutive days along with days on which they took consecutive rides

The desired result is as below

+---------+-----------------------+
| user_id | consecutive_ride_date |
+---------+-----------------------+
|       2 | 2019-11-03            |
|       2 | 2019-11-04            |
|       2 | 2019-11-05            |
|       2 | 2019-11-06            |
|       3 | 2019-11-03            |
|       3 | 2019-11-04            |
|       3 | 2019-11-05            |
|       3 | 2019-11-06            |
|       4 | 2019-11-08            |
|       4 | 2019-11-09            |
|       4 | 2019-11-10            |
+---------+-----------------------+

SQL Fiddle


Solution

  • Here is one way to adress this gaps-and-island problem:

    • first, assign a rank to each user ride with row_number(), and recover the previous ride_date (aliased lag_ride_date)

    • then, compare the date of the previous ride to the current one in a conditional sum, that increases when the dates are successive ; by comparing this with the rank of the user ride, you get groups (aliased grp) that represent consecutive rides with a 1 day spacing

    • do a window count how many records belong to each group (aliased cnt)

    • filter on records whose window count is greater than 3

    Query:

    select user_id, ride_date
    from (
        select 
            t.*,
            count(*) over(partition by user_id, grp) cnt
        from (
            select
                t.*,
                rn1 
                    - sum(case when ride_date = dateadd(day, 1, lag_ride_date) then 1 else 0 end)
                    over(partition by user_id order by ride_date) grp
            from (
                select 
                    t.*,
                    row_number() over(partition by user_id order by ride_date) rn1,
                    lag(ride_date) over(partition by user_id order by ride_date) lag_ride_date
                from Table1 t
            ) t
        ) t
    ) t
    where cnt >= 3
    

    Demo on DB Fiddle