Search code examples
etlanalyticsdata-warehousebusiness-intelligenceazure-synapse

what are possible benefits of having separate layer/a dedicated schema for each layer in ETL (extract , transform and load)?


what are possible benefits of having separate layer/a dedicated schema for each layer in ETL (extract , transform and load), I mean one dedicated layer for staging , one for type 1 persistent tables and a dedicated schema/layer for tables at dimensional model ? why it is not recommended to have all tables in one schema?

Thanks, Rajneesh


Solution

  • Restaurant Metaphor:

    Datawarehouse is like a restaurant. You get vegetables, ingredients from multiple groceries. But, you cannot prepare the food in the table of the customer.

    • You need separate area called store room to keep the vegetables.
    • You need separate area called kitchen top to check quality of vegetables, clean the vegetables, cut the vegetables to conform to right size for the recipe, add right ingredients and make the final recipe.
    • You need separate area called banquet hall to serve final recipe to customer.

    Now, take the case of data warehouse.

    • You need to get different data coming from different source systems. some of them from ORACLE, some from Teradata, some from Enterprise Message System etc. The goal of staging area is for storing raw data. There are dependencies between ingredients. You need to have all of them to prepare the recipe. You need to have them in the staging layer. It is same like store room.

    • The data is cleansed, validated, conformed to right dimensions, added with additional keys, data transformed to suit the data warehouse. Mostly these transformations happen in memory. In some cases, there could be intermediate storage into tables. It is transformation layer. It is same like Kitchen top.

    • The transformed data is loaded into datawarehouse tables. It is loaded into dimensional model. It is the presentation layer or loading layer. It is the banquet hall, where food is available for buffet dinner.

    So, you need three separate layers, so that all of them can be working without interrutping others. There are also additional factors like:

    • Security pertaining to different layers
    • Troubleshooting of the data issue to the source
    • Historical data maintenance
    • Meeting compliance
    • Completely rebuilding data warehouse, if needed from staging area
    • many more