I have a set of data that tells me the owner
for something for each date
, sample data below. There are some breaks in the date
column.
| owner | date |
|-------------+-------------+
| Samantha | 2010-01-02 |
| Max | 2010-01-03 |
| Max | 2010-01-04 |
| Max | 2010-01-06 |
| Max | 2010-01-07 |
| Conor | 2010-01-08 |
| Conor | 2010-01-09 |
| Conor | 2010-01-10 |
| Conor | 2010-01-11 |
| Abigail | 2010-01-12 |
| Abigail | 2010-01-13 |
| Abigail | 2010-01-14 |
| Abigail | 2010-01-15 |
| Max | 2010-01-17 |
| Max | 2010-01-18 |
| Abigail | 2010-01-20 |
| Conor | 2010-01-21 |
I am trying to write a query that can capture date ranges for when each owner's
interval.. such as
| owner | start | end |
|-------------+------------+------------+
| Samantha | 2010-01-02 | 2010-01-02 |
| Max | 2010-01-03 | 2010-01-04 |
| Max | 2010-01-06 | 2010-01-07 |
| Conor | 2010-01-08 | 2010-01-11 |
| Abigail | 2010-01-12 | 2010-01-15 |
| Max | 2010-01-17 | 2010-01-18 |
| Abigail | 2010-01-20 | 2010-01-20 |
| Conor | 2010-01-21 | 2010-01-21 |
I tried think of this using min()
and max()
but I am stuck. I feel like I need to use lead()
and lag()
but not sure how to use them to get the output I want. Any ideas? Thanks in advance!
This is a typical gaps-and-island problem. Here is one way to solve it using row_number()
:
select owner, min(date) start, max(date) end
from (
select
owner,
row_number() over(order by date) rn1,
row_number() over(partition by owner, order by date) rn2
from mytable
) t
group by owner, rn1 - rn2
This works by ranking records by date
over two different partitions (within the whole table and within groups having the same owner
). The difference between the ranks gives you the group each record belongs to. You can run the inner query and look at the results to understand the logic.