I have a table, Table A, in SQL Server that is CDC enabled by Application Programming.
Table A looks like this in Month 1.
Policy_ID DOB Gender Occupation CDC_Ind Load_date Act_Ind
1 290691 M ABC I 01/01/2019 Y
2 290692 M ABC I 01/01/2019 Y
3 290693 F ABC I 01/01/2019 Y
4 290694 M ABC I 01/01/2019 Y
5 290695 F ABC I 01/01/2019 Y
I receive a new file at the beginning of the next month, Feb 1.
That Feb file looks like this.
Policy_ID DOB Gender Occupation
1 290691 M ABC
2 290692 M ABC
4 290693 M ABC
5 290694 F DEF
10 290791 M ABC
It can be seen that Policy_ID = 3
is not in the new file and Policy_ID = 10
, a new ID
is added and Policy_ID = 4
had a change in Occupation.
So, after running the program this is how Table A changes.
CDC_Ind
inferring I for Insert, U for Update and D for Delete and it is done over the Primary Key column Policy_ID
.
Then the March data comes in
Policy_ID DOB Gender Occupation
11 290791 M ABC
So, now my master table changes to something like this.
In a real world, I have the data for 50-60 months.
If I want to retrieve a particular month data (leaving Act_Ind
and CDC_Ind
) how do I write a query?
Expected Results for Feb 2019
Policy_ID DOB Gender Occupation
1 290691 M ABC
2 290692 M ABC
4 290693 M ABC
5 290694 F DEF
10 290791 M ABC
This should do what you expect:
select Policy_ID, DOB, Gender, Occupation
from (
select
t.*,
row_number() over(partition by Policy_ID order by Load_date desc) rn
from masterTable t
where Load_date < '2019-03-01'
) t
where rn = 1 and CDC_Ind <> 'D'
This phrases as: get the last record of each Policy_ID
prior to the date given as parameter, unless that record has CDC_Ind <> 'D'
(ie corresponds to a deletion).
For the date parameter, you want to give the first day of the next month (so for February, you can pass March 1st, as shown in the query).
Policy_ID | DOB | Gender | Occupation --------: | -----: | :----- | :--------- 1 | 290691 | M | ABC 2 | 290692 | M | ABC 4 | 290694 | M | DEF 5 | 290695 | F | ABC 10 | 290791 | M | ABC