We are in a process of building a health data warehouse. And have been having discussions over the basic structure of the data warehouse. I need your suggestions on pros and cons of the below structures. DWH will be used for reporting and research purpose. It will be a near real time data warehouse with latency time of around 5-10 minutes.
The Source database has one Encounter/visit table. Everything is saved in this table. It's the central table which links everything. So If I need to get a patient's journey in the production database, I just go to the encounter/visit table and see how many times a patient has come for a treatment/has been admitted or went back from emergency, has been admitted from emergency etc.
model 1 ->
Encounter/visit table having the common fields (like encounter_id,arrival_date,care_type etc)
and then further tables can be built as per the encounter types with encounter specific fields : Encounter_Emergency (Emergency specific fields such as emergency diagnosis, triage category etc) Encounter_Inpatient Encounter_outpatient
Model 2 -> Having separate tables as base tables and then create a view on the top which then includes all the encounter types together.
Encounter_Emergency (Emergency specific fields such as emergency diagnosis,triage category etc) Encounter_Inpatient Encounter_outpatient
model 3 ->
Encounter/visit table having all the fields as the source database and views are created as per the encounter types with encounter specific fields :
view_Encounter_Emergency view_Encounter_Inpatient view_Encounter_outpatient
these views can be further combined with the emergency_diagnosis table to get the diagnosis or emergency_alerts table to access the alerts etc.
A prime consideration would be how often there will be additions, deletions, or alterations to Encounter Types.
Model B will require extensive rework in advance of any such change just to make sure the data continues to be captured. Either of the other two models will continue to capture reclassed data, but will require rework to report on it.
As between A and C, the question becomes traffic. Views are comparatively easy to spin up/down, but they'll be putting load on that big base table. That might be acceptable if the DW won't have tons of load on it. But if there will be extensive reporting (Pro Tip there's always more extensive reporting than the business tells you there will be), it may be more advantageous to break the data out into stand alone tables.
There is, of course, ETL overhead to maintaining all of those tables.
For speed of delivery, perhaps build Model C, but architect Model A in case consumption requires the more robust model. For the record, you could build Views that don't have any kind of vw_
prefix, or any other identifier in their names that lets users know that they're views. Then, later, you can replace them with tables of the same name, and legacy queries against the old views will continue to work. I've done just the same thing in the opposite direction, sneaking in views to replace redundant tables.