Search code examples
sql-serversql-server-2008ssascdcchange-data-capture

Change Data Capture and SQL Server Analysis Services


I'm designing a database application where data is going to change over time. I want to persist historical data and allow my users to analyze it using SQL Server Analysis Services, but I'm struggling to come up with a database schema that allows this. I've come up with a handful of schemas that could track the changes (including relying on CDC) but then I can't figure out how to turn that schema into a working BISM within SSAS. I've also been able to create a schema that translates nicely in to a BISM but then it doesn't have the historical capabilities I'm looking for. Are there any established best practices for doing this sort of thing?

Here's an example of what I'm trying to do:

I have a fact table called Sales which contains monthly sales figures. I also have a regular dimension table called Customers which allows users to look at sales figures broken down by customer. There is a many-to-many relationship between customers and sales representatives so I can make a reference dimension called Responsibility that refers to the customer dimension and a Sales Representative reference dimension that refers to the Responsibility dimension. I now have the Sales facts linked to Sales Representatives by the chain of reference dimensions Sales -> Customer -> Responsibility -> Sales Representative which allows me to see sales figures broken down by sales rep. The problem is that the Sales facts aren't the only things that change over time. I also want to be able to maintain a history of which Sales Representative was Responsible for a Customer at the time of a particular Sales fact. I also want to know where the Sale Representative's office was located at the time of a particular sales fact, which may be different than his current location. I might also what to know the size of a customer's organization at the time of a particular Sales fact, also which might be different than it is currently. I have no idea how to model this in an BISM-friendly way.


Solution

  • You mentioned that you currently have a fact table which contains monthly sales figures. So one record per customer per month. So each record in this fact table is actually an aggregation of individual sales "transactions" that occurred during the month for the corresponding dimensions.

    So in a given month, there could be 5 individual sales transactions for $10 each for customer 123...and each individual sales transaction could be handled by a different Sales Rep (A, B, C, D, E). In the fact table you describe there would be a single record for $50 for customer 123...but how do we model the SalesReps (A-B-C-D-E)?

    Based on your goals...

    • to be able to maintain a history of which Sales Representative was Responsible for a Customer at the time of a particular Sales fact
    • to know where the Sale Representative's office was located at the time of a particular sales fact
    • to know the size of a customer's organization at the time of a particular Sales fact

    ...I think it would be easier to model at a lower granularity...specifcally a sales-transaction fact table which has a grain of 1 record per sales transaction. Each sales transaction would have a single customer and single sales rep.

    FactSales
        DateKey (date of the sale)
        CustomerKey (customer involved in the sale)
        SalesRepKey (sales rep involved in the sale)
        SalesAmount (amount of the sale)
    

    Now for the historical change tracking...any dimension with attributes for which you want to track historical changes will need to be modeled as a "Slowly Changing Dimension" and will therefore require the use of "Surrogate Keys". So for example, in your customer dimension, Customer ID will not be the primary key...instead it will simply be the business key...and you will use an arbitrary integer as the primary key...this arbitrary key is referred to as a surrogate key.

    Here's how I'd model the data for your dimensions...

    DimCustomer
        CustomerKey (surrogate key, probably generated via IDENTITY function)
        CustomerID (business key, what you will find in your source systems)
        CustomerName
        Location (attribute we wish to track historically)
        -- the following columns are necessary to keep track of history
        BeginDate
        EndDate
        CurrentRecord
    
    DimSalesRep
        SalesRepKey (surrogate key)
        SalesRepID (business key)
        SalesRepName
        OfficeLocation (attribute we wish to track historically)
        -- the following columns are necessary to keep track of historical changes
        BeginDate
        EndDate
        CurrentRecord
    
    FactSales
        DateKey (this is your link to a date dimension)
        CustomerKey (this is your link to DimCustomer)
        SalesRepKey (this is your link to DimSalesRep)
        SalesAmount
    

    What this does is allow you to have multiple records for the same customer. Ex. CustomerID 123 moves from NC to GA on 3/5/2012...

    CustomerKey | CustomerID | CustomerName | Location | BeginDate | EndDate | CurrentRecord
    1 | 123 | Ted Stevens | North Carolina | 01-01-1900 | 03-05-2012 | 0
    2 | 123 | Ted Stevens | Georgia        | 03-05-2012 | 01-01-2999 | 1
    

    The same applies with SalesReps or any other dimension in which you want to track the historical changes for some of the attributes.

    So when you slice the sales transaction fact table by CustomerID, CustomerName (or any other non-historicaly-tracked attribute) you should see a single record with the facts aggregated across all transactions for the customer. And if you instead decide to analyze the sales transactions by CustomerName and Location (the historically tracked attribute), you will see a separate record for each "version" of the customer location corresponding to the sales amount while the customer was in that location.

    By the way, if you have some time and are interested in learning more, I highly recommend the Kimball bible "The Data Warehouse Toolkit"...which should provide a solid foundation on dimensional modeling scenarios.