Search code examples
sqldata-warehouse

SCD2 WITH FACT TABLE IMPLEMENTATION


I am asked to build a client dimension and a bed dimension .

and bring them together in the sense of clientID-SK,bedID_SK,Bed_begin_date,bed_end-date.Both tables contains SCD1, and SC2 fields.How do I implement this if the dates the clients was and out off bed and out has nothing to do with what defines as a client or bed(types).

I have been able to combine them but my challenge is that when I load them into a fact table the table only has the begin_date .How will I update the fact table end_date which is suppose to = the begin_date of the next bed assignment.

e.g clientID,bedID,Start_Date,End_Date 10 ,ROO1, ,01-19-2020, 3000-01-01 00:00:00.000

Dimension 10 ,ROO1, ,01-19-2020, 10-19-2020 10 ,ROO2, ,10-19-2020, 3000-01-01 00:00:00.000

We have a table called current bed that keeps track of our current client and I was able to build a slowly changing dimension off that table.

But we are concerned to follow standard practice we have to have a star schema in place .

Any suggestion


Solution

  • So you have, at least, the following tables:

    1. Client Dimension holding all the client attributes
    2. Bed Dimension holding all the Bed attributes
    3. A Date Dimension
    4. A Bed Occupancy Fact with FKs to Client Dim, Bed Dim and 2 FKs to Date Dim (one for Bed occupied and one for bed vacated)

    When a bed is first occupied by a client you create a new fact record and populate the Client, Bed and Date Occupied FKs. You populate the Bed Vacated with 0 (or whatever key value you have used in the Date Dim to indicate the 'unknown' record).

    When a bed is next occupied, you create a new fact record for the new client and update the Bed Vacated FK on the previous record with the relevant Date key.

    A few things to think about:

    1. Are you only working at the Date level of granularity or at Time level i.e. are you interested in what time of day (or morning/afternoon, etc.) when a bed was occupied/vacated?
    2. I would ensure that the Date Vacated of the previous occupancy and the Date Occupied of the current one are not the same value otherwise you can get double counting on that overlapping date unless you start implementing logic to prevent it. For example, if a bed is occupied on the 25th Sept then set the Vacated date of the previous record to 24th Sept
    3. Can you have periods when a bed is unoccupied? If you can, then I would create a fact record for this in exactly the same way as you would for an occupied bed but set the client ID FK to 0 (or whatever value you use in the client Dim to indicate a "not applicable" client)

    Hope this helps?

    Update 1 following response

    If you need to include Time then you need a time dimension and 2 additional keys in the fact for occupied and vacated time.

    I'm not sure I understand your question about how you update the fact table. You have the information required to identify the fact record (bed id and vacated date key = 0) and the value needed to update the fact record. What am I missing?

    UPDATE 2

    I think you need to take a step back and think clearly about what it is you are trying to achieve - then the answers to your questions should become more obvious.

    The first question you need to ask is what are you trying to measure: once you have clearly defined that then the grain of the fact table is established and it becomes clearer what changes in attributes you need to handle. For example:

    1. If you just want to know the status of a bed every time the occupant changes, and only the status of the occupant when they first use the bed (or last use the bed), then you only need to add a fact record when the bed occupancy changes and there is no need to record any updates during that patient's occupancy
    2. If you want to know the state of of the bed at any point in time then first you need to define what you mean by "any point in time": every day, hour, minute, etc? Then you need to decide what you want to record if there are multiple changes in that time period i.e. the position at the start of the hour or the end of the hour. Based on these decisions, you then need to work out if there have been any changes during that time period and, if there have been, insert/update the relevant records
    3. If you want to treat each patient's occupancy of a bed as a single fact then your fact record obviously has start and end dates but you also need to make the decision about which single state you are going to record for any attributes that can change over that period - you can record the patient's status at the start or end of the occupancy but not throughout the occupancy as that would affect the grain of the fact table

    So to try and answer your questions...

    If there is a change in dimension attributes and it affects your fact table then you'll need to handle this e.g. by inserting or updating a fact record:

    • If you are only interested in the state of the patient at the start or end of the occupancy then any change to the patient's attributes during the occupancy can be ignored
    • If you are interested in the state of the patient at any point in the occupancy then you'll need to make changes to the fact table whenever one of the patient's attributes changes

    Records in your fact table should never overlap each other - so at any point in time there is only one active fact record per bed and per patient. Each time you insert a new fact record you would expire the previous applicable fact record.

    So when you ask "The update to the end_date when the client moves to a new bed will be on all 3 added surrogate key rows?", the answer is no - you would have set the end date on the first 2 records when you created the next record each time i.e. set the end date of record 1 when you create record 2, set the end date of record 2 when you create record 3, etc.; so you will only be updating the last record when the client moves.

    Adding a PK to a fact table is only required when there is a requirement to update the fact table - as is the case here. Whether you do so is a choice - but I would look at how complicated the compound key is i.e. how many SKs do you need to use to identify the correct fact record to be updated. In you case you only need the Bed SK and the end_date = null (or 31/12/3000 or however you have chosen to set it) so there is probably no benefit in defining a single PK field on the fact table. If you needed more than about 5 SKs to identify a fact record then there is probably a case for using a single PK field.

    UPDATE 3 - following comment added on 17/11/2020

    Mini-dimensions: just seem to be more, unnecessary complication but I can't really comment unless you can clearly articulate what the issue is that you think mini-dimensions will solve and why you think mini-dimensions are a solution to the issue

    Dates

    You seem to be confused about the effective dates on an SDC2 dimension and foreign keys on a Fact table referencing the Date dimension - as they are very different things.

    Date FKs on a Fact are attributes that you have chosen to record for that fact. In your example, for each bed occupancy fact (i.e. a single record in your fact table) you might have "Date Occupied" and "Date Vacated" attributes/FKs that reference the Date Dimension. When a fact record is created you would populate the "Date Occupied" field with the appropriate date and the "Date Vacated" with "0" (or whatever value points to the "Unknown" record in your Date Dimension). When the bed becomes unoccupied you update the fact record and set the "Date Vacated" field to the appropriate date.

    Because you need to record 2 different dates against the fact, you need to have two FKs referencing the Date dimension; you couldn't record the Date Occupied and the Date Vacated using a single reference to the Date Dimension.

    The same type of thinking applies when you want to have an FK on a fact table that references an SCD2 dimension; you need to decide what the point-in-time context of that reference is and then link to the correct version of the record in the SCD2 dimension. So if you want to record the state of the patient at the point they occupy the bed then you pick their record in the dimension where Fact.DateOccupied between Dim.EffStartDate and Dim.EffEndDate. If you want to also record the date of the patient at a different (but specific) time, such as when the bed was vacated, then you would need to add a separate FK to the fact table to hold this additional reference to the Patient Dim.

    Having populated your fact table, if you want to know the state of the patient at a specific point in time you don't need to do anything to the fact table; instead you need to join the Patient Dim to itself. e.g.

    1. The fact table holds an FK that references a record in the Patient Dim
    2. From this Patient Dim record you can get the patient's BK
    3. Join from this BK back to the Patient Dim and filter on the date that you want to get the patient's details for

    Pseudo-code SQL for this would look something like (assuming you wanted to know the state of the patient on '2020-11-17'):

    SELECT
        P2.*
    FROM
        FACT_TABLE F
        INNER JOIN PATIENT_DIM P1
            ON F.PATIENT_SK = P1.PATIENT_SK
        INNER JOIN PATIENT_DIM P2
            ON  P1.PATIENT_BK         = P2.PATIENT_BK
                AND P2.EFFSTART_DATE <= '2020-11-17'
                AND P2.EFF_END_DATE  >= '2020-11-17'
    

    Hope this helps?