Search code examples
sqlsql-serverdategreatest-n-per-groupwindow-functions

Retrieve a particular month SQL Records from Master Table - CDC Enabled


I have a table, Table A, in SQL Server that is CDC enabled by Application Programming.

Table A looks like this in Month 1.

Policy_ID   DOB      Gender   Occupation CDC_Ind  Load_date        Act_Ind
1           290691    M          ABC       I          01/01/2019     Y
2           290692    M          ABC       I          01/01/2019     Y
3           290693    F          ABC       I          01/01/2019     Y
4           290694    M          ABC       I          01/01/2019     Y
5           290695    F          ABC       I          01/01/2019     Y

I receive a new file at the beginning of the next month, Feb 1.

That Feb file looks like this.

Policy_ID   DOB      Gender   Occupation 
1           290691    M          ABC  
2           290692    M          ABC   
4           290693    M          ABC 
5           290694    F          DEF     
10          290791    M          ABC

It can be seen that Policy_ID = 3 is not in the new file and Policy_ID = 10, a new ID is added and Policy_ID = 4 had a change in Occupation.

So, after running the program this is how Table A changes.

enter image description here

CDC_Ind inferring I for Insert, U for Update and D for Delete and it is done over the Primary Key column Policy_ID.

Then the March data comes in

Policy_ID   DOB      Gender   Occupation    
11          290791    M          ABC

So, now my master table changes to something like this.

enter image description here

In a real world, I have the data for 50-60 months.

If I want to retrieve a particular month data (leaving Act_Ind and CDC_Ind) how do I write a query?

Expected Results for Feb 2019

Policy_ID   DOB      Gender   Occupation 
1           290691    M          ABC  
2           290692    M          ABC   
4           290693    M          ABC 
5           290694    F          DEF     
10          290791    M          ABC  

Solution

  • This should do what you expect:

    select Policy_ID, DOB, Gender, Occupation
    from (
        select 
            t.*,
            row_number() over(partition by Policy_ID order by Load_date desc) rn
        from masterTable t
        where Load_date < '2019-03-01'
    ) t
    where rn = 1 and CDC_Ind <> 'D' 
    

    This phrases as: get the last record of each Policy_ID prior to the date given as parameter, unless that record has CDC_Ind <> 'D' (ie corresponds to a deletion).

    For the date parameter, you want to give the first day of the next month (so for February, you can pass March 1st, as shown in the query).

    Demo on DB Fiddle:

    Policy_ID |    DOB | Gender | Occupation
    --------: | -----: | :----- | :---------
            1 | 290691 | M      | ABC       
            2 | 290692 | M      | ABC       
            4 | 290694 | M      | DEF       
            5 | 290695 | F      | ABC       
           10 | 290791 | M      | ABC