Search code examples
sqlamazon-redshiftwindow-functionsgaps-and-islands

How to do partition by using date intervals in SQL


I have the following sample data with 3 columns, i.e., account_id, interval_start, interval_end. And I want to create a reactivated column with the given output in the following table.

For each account_id, reactivated column should start with a 0 value and then I want to check if the previous interval_end and current interval_start value is more than a month then I want to increment that value as in to 1 and then next time when previous interval_end and current interval_start is more than a month, then it should be 2 and so on.

account_id interval_start interval_end reactivated
ABC 2015-01-01 00:00:00.000 2015-02-01 00:00:00.000 0
ABC 2015-02-01 00:00:00.000 2015-03-01 00:00:00.000 0
ABC 2015-03-01 00:00:00.000 2015-04-01 00:00:00.000 0
ABC 2015-09-01 00:00:00.000 2015-10-01 00:00:00.000 1
ABC 2015-10-01 00:00:00.000 2015-11-01 00:00:00.000 1
ABC 2017-04-01 00:00:00.000 2017-05-01 00:00:00.000 2
ABC 2017-05-01 00:00:00.000 2017-06-01 00:00:00.000 2
ABC 2017-06-01 00:00:00.000 2017-07-01 00:00:00.000 2
ABC 2017-07-01 00:00:00.000 2017-08-01 00:00:00.000 2
ABC 2019-03-01 00:00:00.000 2019-04-01 00:00:00.000 3
ABC 2019-04-01 00:00:00.000 2019-05-01 00:00:00.000 3
ABC 2019-05-01 00:00:00.000 2019-06-01 00:00:00.000 3
ABC 2021-07-01 00:00:00.000 2021-08-01 00:00:00.000 4
ABC 2021-08-01 00:00:00.000 2021-09-01 00:00:00.000 4
DEF 2015-03-01 00:00:00.000 2015-04-01 00:00:00.000 0
DEF 2015-04-01 00:00:00.000 2015-05-01 00:00:00.000 0
DEF 2017-06-01 00:00:00.000 2017-07-01 00:00:00.000 1
DEF 2017-07-01 00:00:00.000 2017-08-01 00:00:00.000 1
DEF 2019-05-01 00:00:00.000 2019-06-01 00:00:00.000 2

So far I'm only able to get 0s and 1s in the reactivated column by doing the following but its not what I want:

case when
interval_start - lag(interval_end) over (partition by account_id order by interval_start) as prev_interval_end > 30 then 1 else 0 end

How do I get the value to increment as well for each account_id please.


Solution

  • Adding to above answer. Since I'm using Redshift what worked for me is:

    sum(case when interval_start > lag_interval_end then 1 else 0 end) over (partition by account_id order by interval_start rows unbounded preceding) as  grp