Search code examples
sqlsql-servertime-serieswindow-functionsaggregation

Re-calculate start and end dates based on length of time elapsed between records (i.e., rows)


I'm stumped. I have time series data in a SQL Server database. Each row represents when a customer has stayed at the hotel. However, sometimes a customer will extend a stay, change rooms etc, which gets recorded in the system as a separate booking. For our purposes, I need to consider those contiguous 'bookings' as a continued stay. Consider the data:

| CUSTOMERID | STARTDATE  | ENDDATE  |
| --------   | -----------| -------- |
| 1          |2021-07-02  |2021-07-14|
| 1          |2021-07-19  |2021-07-27|
| 2          |2018-11-12  |2018-11-16|
| 2          |2018-11-17  |2018-11-19|
| 2          |2018-11-19  |2018-11-25|
| 2          |2019-01-10  |2019-01-15|

What I need to do is roll up any two records where the STARTDATE of the next visit for a customer is <=1 day difference from the ENDDATE of the current record. In other words, there must be a full calendar day between visits to consider them distinct. I need the final table to look something like this:

| CUSTOMERID | STARTDATE  | ENDDATE  |NEWSTARTDATE|NEWENDDATE|
| --------   | -----------| -------- |------------|----------|
| 1          |2021-07-02  |2021-07-14|2021-07-02  |2021-07-14|
| 1          |2021-07-19  |2021-07-27|2021-07-19  |2021-07-27|
| 2          |2018-11-12  |2018-11-16|2018-11-12  |2018-11-25|
| 2          |2018-11-17  |2018-11-19|2018-11-12  |2018-11-25|
| 2          |2018-11-19  |2018-11-25|2018-11-12  |2018-11-25|
| 2          |2019-01-10  |2019-01-15|2019-01-10  |2019-01-15| 

Solution

  • Thanks to @lptr for the answer, which includes a window function in a case statement of the subquery. See link below:

    https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=f2c905f2c31675c179d9f2100d74f44f

    create table t(CUSTOMERID int, STARTDATE date, ENDDATE date);
    insert into t(CUSTOMERID , STARTDATE , ENDDATE )
    values
    (1, '2021-07-02', '2021-07-14'),
    (1, '2021-07-19', '2021-07-27'),
    (2, '2018-11-12', '2018-11-16'),
    (2, '2018-11-17', '2018-11-19'),
    (2, '2018-11-19', '2018-11-25'),
    (2, '2019-01-10', '2019-01-15');
    
    
    select CUSTOMERID , STARTDATE , ENDDATE,
     max(sdate) over(partition by CUSTOMERID order by STARTDATE ) as NewStartDate,
     min(edate) over(partition by CUSTOMERID order by STARTDATE rows between current row and unbounded following) as NewEndDate
    from
    (
    select *, 
     case when STARTDATE <= lag(dateadd(day, 1, ENDDATE)) over(partition by CUSTOMERID order by STARTDATE ) then null 
          else STARTDATE 
     end as sdate,
     case when ENDDATE>= lead(dateadd(day, -1, STARTDATE )) over(partition by CUSTOMERID order by STARTDATE ) then null 
          else ENDDATE
     end as edate
    from t
    ) as t
    order by STARTDATE ;