I'm stumped. I have time series data in a SQL Server database. Each row represents when a customer has stayed at the hotel. However, sometimes a customer will extend a stay, change rooms etc, which gets recorded in the system as a separate booking. For our purposes, I need to consider those contiguous 'bookings' as a continued stay. Consider the data:
| CUSTOMERID | STARTDATE | ENDDATE |
| -------- | -----------| -------- |
| 1 |2021-07-02 |2021-07-14|
| 1 |2021-07-19 |2021-07-27|
| 2 |2018-11-12 |2018-11-16|
| 2 |2018-11-17 |2018-11-19|
| 2 |2018-11-19 |2018-11-25|
| 2 |2019-01-10 |2019-01-15|
What I need to do is roll up any two records where the STARTDATE
of the next visit for a customer is <=1 day difference from the ENDDATE
of the current record. In other words, there must be a full calendar day between visits to consider them distinct. I need the final table to look something like this:
| CUSTOMERID | STARTDATE | ENDDATE |NEWSTARTDATE|NEWENDDATE|
| -------- | -----------| -------- |------------|----------|
| 1 |2021-07-02 |2021-07-14|2021-07-02 |2021-07-14|
| 1 |2021-07-19 |2021-07-27|2021-07-19 |2021-07-27|
| 2 |2018-11-12 |2018-11-16|2018-11-12 |2018-11-25|
| 2 |2018-11-17 |2018-11-19|2018-11-12 |2018-11-25|
| 2 |2018-11-19 |2018-11-25|2018-11-12 |2018-11-25|
| 2 |2019-01-10 |2019-01-15|2019-01-10 |2019-01-15|
Thanks to @lptr for the answer, which includes a window function in a case statement of the subquery. See link below:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=f2c905f2c31675c179d9f2100d74f44f
create table t(CUSTOMERID int, STARTDATE date, ENDDATE date);
insert into t(CUSTOMERID , STARTDATE , ENDDATE )
values
(1, '2021-07-02', '2021-07-14'),
(1, '2021-07-19', '2021-07-27'),
(2, '2018-11-12', '2018-11-16'),
(2, '2018-11-17', '2018-11-19'),
(2, '2018-11-19', '2018-11-25'),
(2, '2019-01-10', '2019-01-15');
select CUSTOMERID , STARTDATE , ENDDATE,
max(sdate) over(partition by CUSTOMERID order by STARTDATE ) as NewStartDate,
min(edate) over(partition by CUSTOMERID order by STARTDATE rows between current row and unbounded following) as NewEndDate
from
(
select *,
case when STARTDATE <= lag(dateadd(day, 1, ENDDATE)) over(partition by CUSTOMERID order by STARTDATE ) then null
else STARTDATE
end as sdate,
case when ENDDATE>= lead(dateadd(day, -1, STARTDATE )) over(partition by CUSTOMERID order by STARTDATE ) then null
else ENDDATE
end as edate
from t
) as t
order by STARTDATE ;