Search code examples
sqldatedb2

Selecting records within specified date range based on previous row in group


From the following dataset, I need to find all records whose begin_date occurs within 30 days of the end_date of the previous record (by date) for each client.

I need to compare row 1 for client X to row 2 (if exists), row 2 to row 3 (if exists) and so on.

CLIENTID BEGIN_DATE END_DATE


  14  2016-12-07 11:00:00.0  2017-04-06 09:28:00.0
  14  2017-04-07 06:00:00.0  2017-11-30 09:46:00.0
  22  2016-04-20 03:45:00.0  2017-11-30 09:47:00.0
  25  2016-07-25 04:00:00.0  2017-11-30 09:46:00.0
 121  2017-08-23 07:00:00.0  2017-11-30 09:45:00.0
2004  2017-12-31 08:58:00.0  2018-01-05 18:35:00.0
2004  2018-04-04 00:40:00.0  2018-04-11 18:29:00.0
2004  2018-05-11 03:37:00.0  2018-05-18 14:40:00.0
2004  2018-05-31 10:22:00.0  2018-06-06 09:25:00.0
2004  2018-08-26 06:27:00.0  2018-09-03 13:00:00.0
2034  2020-03-23 23:54:00.0  2020-04-01 13:29:00.0
2046  2018-01-10 07:21:00.0  2018-01-16 14:42:00.0
2072  2021-12-20 12:08:00.0  2021-12-23 13:22:00.0
2080  2019-01-04 12:55:00.0  2019-01-08 13:43:00.0
2082  2019-06-05 01:43:00.0  2019-06-10 16:15:00.0
2091  2018-02-16 10:28:00.0  2018-02-21 18:11:00.0
2091  2018-05-23 09:55:00.0  2018-05-31 13:37:00.0
2113  2018-10-03 08:10:00.0  2018-10-08 13:13:00.0
2148  2022-11-29 18:23:00.0  2022-12-13 10:10:00.0
2148  2023-02-14 17:08:00.0  2023-02-14 17:09:00.0
2148  2023-02-14 20:41:00.0  2023-02-24 11:11:00.0
2148  2023-06-24 08:05:00.0  2023-08-03 11:19:00.0
2148  2023-12-21 08:59:00.0  2023-12-30 11:05:00.0
2149  2018-03-07 05:28:00.0  2018-04-11 09:44:00.0
2154  2018-07-16 04:12:00.0  2018-08-07 16:33:00.0
2154  2018-09-03 02:14:00.0  2018-09-11 20:42:00.0
2154  2018-10-16 12:47:00.0  2019-05-30 11:21:00.0
2190  2018-02-06 01:40:00.0  2018-02-15 11:50:00.0
2195  2021-08-11 14:58:00.0  2021-08-17 09:52:00.0
2197  2018-02-19 09:47:00.0  2018-02-23 08:04:00.0
2197  2018-07-18 05:38:00.0  2018-07-25 11:33:00.0
2197  2018-10-04 03:00:00.0  2018-10-15 12:18:00.0
2197  2019-01-27 04:31:00.0  2019-01-31 09:36:00.0
2198  2018-02-09 02:57:00.0  2018-02-21 13:01:00.0
2227  2018-06-11 01:21:00.0  2018-06-14 11:57:00.0
2227  2019-10-29 14:04:00.0  2019-11-04 16:32:00.0
2282  2021-02-11 18:53:00.0  2021-03-04 10:23:00.0
2306  2020-01-28 12:57:00.0  2020-02-03 19:15:00.0
2315  2019-05-22 17:37:00.0  2019-05-23 19:30:00.0
2324  2018-07-25 01:35:00.0  2018-07-31 12:51:00.0
2330  2020-01-13 15:55:00.0  2020-01-21 18:27:00.0
2330  2020-03-04 23:40:00.0  2020-03-10 17:46:00.0
2356  2018-06-08 03:59:00.0  2018-06-15 12:38:00.0
2415  2017-12-18 07:06:00.0  2017-12-21 17:00:00.0
2423  2018-10-10 01:14:00.0  2018-10-17 15:46:00.0
2423  2019-09-16 21:53:00.0  2019-09-19 13:55:00.0
2460  2023-11-13 04:17:00.0  2023-11-16 11:17:00.0
2469  2019-09-14 09:36:00.0  2019-09-20 15:19:00.0
2497  2019-05-14 23:27:00.0  2019-05-22 11:13:00.0
2531  2017-12-26 07:46:00.0  2018-01-03 19:53:00.0
2531  2018-05-07 06:02:00.0  2018-05-14 20:54:00.0
2533  2019-02-18 19:43:00.0  2019-02-28 15:23:00.0
2533  2019-04-30 20:34:00.0  2019-09-12 08:29:00.0
2567  2018-01-25 01:49:00.0  2018-03-06 08:59:00.0
2567  2018-11-06 10:57:00.0  2018-11-17 13:22:00.0
2567  2018-12-23 10:38:00.0  2019-01-09 15:54:00.0
2567  2019-04-19 20:22:00.0  2019-07-06 16:18:00.0
2585  2018-01-02 11:35:00.0  2018-01-12 11:50:00.0

Solution

  • OLAP functions - specially LAG and LEAD can be helpful here. Check out the following query listing the difference in days:

    SELECT clientid, BEGIN_DATE, END_DATE 
         , lag(END_DATE) OVER (PARTITION BY clientid ORDER BY END_DATE) AS lag_END_DATE
         , days_Between(BEGIN_DATE, lag(END_DATE) OVER (PARTITION BY clientid ORDER BY END_DATE)) AS Difference_in_days
      FROM client
    

    By filtering the result you could get what you described. I intentionally left lag_END_DATE in the resultset so you can see the intermediate step.