Search code examples
sqlsql-serversql-server-2008cdcchange-data-capture

How do I reconstruct a historical view?


I am currently exploring Change Data Capture as an option to store temporal databases. It is great because it stores only the deltas and seems like it may solve my problem. When I enabled CDC, a bunch of tables appeared under System Tables.

When querying cdc.dbo_MyTable, I am able to see all the changes that took place on the table. Now, how would I construct a historical view? For instance, if I wanted to see the state of my table as of a particular date, how would I go about doing that? Is that even possible?

It looks I need to take the log and start applying it over my original table but I was wondering if there is a built-in way of doing this. Any suggestions?

Some of the use cases I am looking at:

  • Know the state of the graph at a particular point in time
  • Given two graphs at different times, know the set of links that are different (this can probably be obtained using an EXCEPT clause after constructing the tables)

Solution

  • it's possible, but not with a built-in way i'm a afraid. You would have to reconstruct the timeline by hand.

    Given that the change-tracking tables offer the tran_end_time, which is the time that the value of the property should be perceived as persisted, you would have to make a query that fetches all the distinct periods of table states, join on the tracked property changes and then pivot (to have a presentation in the same form as the table). Don't forget to union with the table state itself to obtain the values that have not been changed/tracked for completeness.

    The final result, simplified, should look like

    RN PK PropA   PropB   FromDate          ToDate
    1  1  'Ver1'  'Ver1'  2012-01-01 09:00  2012-01-02 08:00
    2  1  'Ver1'  'Ver2'  2012-01-02 08:00  2012-01-03 07:00
    3  1  'Ver2'  'Ver2'  2012-01-03 07:00  *getdate()*
    4  2  'Ver1'  'Ver1'  2012-01-01 05:00  2012-01-02 06:00
    5  2  'Ver1'  'Ver2'  2012-01-02 06:00  2012-01-03 01:00
    6  2  'Ver2'  'Ver2'  2012-01-03 01:00  *getdate()*
    

    note that the getdate() is valid if the row wasn't deleted in which case it should be substituted with the deletion date

    EDIT, for the 2 use cases. The first point is easily addressed it's a matter of constructing the temporal object graph and then filtering:

    declare @pointInTime datetime = '20120102 10:00';
    select * from Reconstructed_TG where FromDate <= @pointInTime and @pointInTime < ToDate
    

    the second point, can be generated easily with the EXCEPT clause, as you point out. given the above query:

    declare @pointInTimeA datetime = '20120102 10:00';
    declare @pointInTimeB datetime = '20120103 01:00';
    select * from Reconstructed_TG where FromDate <= @pointInTimeA and @pointInTimeA < ToDate
    EXCEPT
    select * from Reconstructed_TG where FromDate <= @pointInTimeB and @pointInTimeB < ToDate
    

    yet the except clause only presents the rows that have at least one different column value; i don't know if that information is really meaningful to the human eye. Depending on your needs a query that works directly on the cdc data may be more appropriate.