Search code examples
sqlpostgresqlgroup-bywindow-functionsgaps-and-islands

How to you identify date ranges/intervals for specified values?


I have a set of data that tells me the owner for something for each date, sample data below. There are some breaks in the date column.

|    owner    |    date     |
|-------------+-------------+
|  Samantha   |  2010-01-02 |
|  Max        |  2010-01-03 |
|  Max        |  2010-01-04 |
|  Max        |  2010-01-06 |
|  Max        |  2010-01-07 |
|  Conor      |  2010-01-08 |
|  Conor      |  2010-01-09 |
|  Conor      |  2010-01-10 |
|  Conor      |  2010-01-11 |
|  Abigail    |  2010-01-12 |
|  Abigail    |  2010-01-13 |
|  Abigail    |  2010-01-14 |
|  Abigail    |  2010-01-15 |
|  Max        |  2010-01-17 |
|  Max        |  2010-01-18 |
|  Abigail    |  2010-01-20 |
|  Conor      |  2010-01-21 |

I am trying to write a query that can capture date ranges for when each owner's interval.. such as

|    owner    |    start   |     end    |
|-------------+------------+------------+
|  Samantha   | 2010-01-02 | 2010-01-02 |
|  Max        | 2010-01-03 | 2010-01-04 |
|  Max        | 2010-01-06 | 2010-01-07 |
|  Conor      | 2010-01-08 | 2010-01-11 |
|  Abigail    | 2010-01-12 | 2010-01-15 |
|  Max        | 2010-01-17 | 2010-01-18 |
|  Abigail    | 2010-01-20 | 2010-01-20 |
|  Conor      | 2010-01-21 | 2010-01-21 |

I tried think of this using min() and max() but I am stuck. I feel like I need to use lead() and lag() but not sure how to use them to get the output I want. Any ideas? Thanks in advance!


Solution

  • This is a typical gaps-and-island problem. Here is one way to solve it using row_number():

    select owner, min(date) start, max(date) end
    from (
        select 
            owner,
            row_number() over(order by date) rn1,
            row_number() over(partition by owner, order by date) rn2
        from mytable
    ) t
    group by owner, rn1 - rn2
    

    This works by ranking records by date over two different partitions (within the whole table and within groups having the same owner). The difference between the ranks gives you the group each record belongs to. You can run the inner query and look at the results to understand the logic.