I have data stored on the DB as gaps and islands and I need to join it with another table. the gaps and islands table like the below:
|Subs_ID|ORIGINAL_STATUS|NEW_STATUS|CHANGE_DATE|
|-------+---------------+----------+-----------|
|123456 |1 |2 |12/2/2017 |
|123456 |2 |3 |12/8/2019 |
|123456 |3 |4 |12/18/2019 |
|123456 |4 |8 |12/28/2019 |
|123456 |8 |9 |10/4/2020 |
the second table only includes the Subs_ID and sequenced Connect_date
|Subs_ID|CONNECT_DATE|
|-------+------------|
|123456 |12/1/2017 |
|123456 |12/3/2017 |
|123456 |11/4/2018 |
|123456 |10/5/2019 |
|123456 |12/30/2019 |
|123456 |10/4/2020 |
|123456 |5/21/2021 |
I need to join the current STATUS
from the first table to the second one using the subs_id
and the dates
. and the result will be the below where if the Connect_date
less than the Change_Date
will take the first ORIGINAL_STATUS
and vs if the Connect_date
more than the Change_Date
will take the last NEW_STATUS
|Subs_ID|CONNECT_DATE|Status|
|-------+------------+------|
|123456 |12/1/2017 |1 |
|123456 |12/3/2017 |2 |
|123456 |11/4/2018 |2 |
|123456 |10/5/2019 |2 |
|123456 |12/30/2019 |8 |
|123456 |10/4/2020 |8 |
|123456 |5/21/2021 |9 |
I hope this query helps you. Code breakdown and explanation given below
select a.*,B.ORIGINAL_STATUS
from mytable2 a
join (select *,LAG(CHANGE_DATE) OVER(ORDER BY CHANGE_DATE) as previous_date_value
from mytable)b
on a.Subs_ID =b.Subs_ID and a.CONNECT_DATE >b.previous_date_value and a.CONNECT_DATE<= b.CHANGE_DATE
UNION
select * from (
select a.*,CASE WHEN A.CONNECT_dATE<B.CHANGE_DATE THEN b.ORIGINAL_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)less_than_available_date
where status is not null
UNION
select * from (
select a.*,CASE WHEN A.CONNECT_dATE>B.CHANGE_DATE THEN b.NEW_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE desc)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)greater_than_available_date
where status is not null
Explanation: First block of code with return the status of the connect date if the connect date is present with in the range of available change dates. I am using lag function to fetch previous change date value and comparing with connect date.
select a.*,B.ORIGINAL_STATUS
from mytable2 a
join (select *,LAG(CHANGE_DATE) OVER(ORDER BY CHANGE_DATE) as previous_date_value
from mytable)b
on a.Subs_ID =b.Subs_ID and a.CONNECT_DATE >b.previous_date_value and a.CONNECT_DATE<= b.CHANGE_DATE
This gives us the following result set
+---------+--------------+-----------------+
| Subs_ID | CONNECT_DATE | ORIGINAL_STATUS |
+---------+--------------+-----------------+
| 123456 | 2017-12-03 | 2 |
| 123456 | 2018-11-04 | 2 |
| 123456 | 2019-10-05 | 2 |
| 123456 | 2019-12-30 | 8 |
| 123456 | 2020-10-04 | 8 |
+---------+--------------+-----------------+
Now we want to find the status of 12/1/2017 which is lesser than the available change date. table 2 is joined with table 1 which has the row of minimum change date and ORIGINAL_STATUS is taken if the connect date is lesser than change date..
select * from (
select a.*,CASE WHEN A.CONNECT_dATE<B.CHANGE_DATE THEN b.ORIGINAL_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)less_than_available_date
where status is not null
+---------+--------------+--------+
| Subs_ID | CONNECT_DATE | STATUS |
+---------+--------------+--------+
| 123456 | 2017-12-01 | 1 |
+---------+--------------+--------+
Left over is the records with connect date greater than available change date. This is achieved by below code and table 2 is joined with table 1 having the row of max change date and new status is taken if the connect date is greater than change date.
select * from (
select a.*,CASE WHEN A.CONNECT_dATE>B.CHANGE_DATE THEN b.NEW_STATUS ELSE NULL
END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number()
over(partition by Subs_ID order by CHANGE_DATE desc)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)greater_than_available_date
where status is not null
+---------+--------------+--------+
| Subs_ID | CONNECT_DATE | STATUS |
+---------+--------------+--------+
| 123456 | 2021-05-21 | 9 |
+---------+--------------+--------+
Finally after applying union we get the required resultset
+---------+--------------+-----------------+
| Subs_ID | CONNECT_DATE | ORIGINAL_STATUS |
+---------+--------------+-----------------+
| 123456 | 2017-12-01 | 1 |
| 123456 | 2017-12-03 | 2 |
| 123456 | 2018-11-04 | 2 |
| 123456 | 2019-10-05 | 2 |
| 123456 | 2019-12-30 | 8 |
| 123456 | 2020-10-04 | 8 |
| 123456 | 2021-05-21 | 9 |
+---------+--------------+-----------------+