Search code examples
databasecomparisondata-warehousedata-lakedatamart

Database vs DataMart vs Data Warehouse vs Data Lake


Looking for the high-level differences/comparison among

  • Database
  • Data Mart (Top-down approach)
  • Data Warehouse
  • Data Lake
  • Data Lakehouse

Please use relative comparison when specifics are not available.


Solution

  • Included below is a high-level comparison between the various data tiers mentioned. Please feel free to drop a comment if any of these need corrections.

    Feature Database Data Mart Data Warehouse Data Lake Data Lakehouse
    Source Single Single Multiple Multiple Multiple
    Structure Structured Structured Structured Raw Structured, semi-structured, and unstructured
    Purpose Determined Determined Determined Determined Determined
    Storage Centralized Decentralized Centralized Centralized Centralized
    Data Format Detailed Summarized Both detailed and summarized All All
    Flexibility Low Medium Medium High High
    Primary Use Transactional Reporting Analytics & Reporting Analytics Analytics
    Cost Low Medium Medium High High
    Data Volume Low Low Medium High High
    Development Top-down Bottom-up Top-down All All
    Design Time Medium Medium High Low Low
    Volatility Medium Low None None None
    Data Operations CRUD CR CRU CR CRUD
    Subject Area Single Single Multiple Multiple Multiple
    Design Schema Relational Multi-dimensional Relational No schema Hybrid

    Notes:

    • The cost of a data lakehouse can be lower than a data warehouse if the data is stored in a cloud-based object storage system.
    • The data volume of a data lake can be much higher than a data warehouse or data mart.
    • The development time for a data lakehouse can be lower than a data warehouse if the data is already stored in a cloud-based object storage system.
    • The volatility of a data lake can be higher than a data warehouse or data mart because the data is not always structured and may change frequently.

    Fit-gap

    • If you need a low-cost, flexible repository for structured data, then a database is a good option.
    • If you need a repository for summarized data for reporting purposes, then a data mart is a good option.
    • If you need a repository for detailed data for analytics and reporting purposes, then a data warehouse is a good option.
    • If you need a repository for all types of data for analytics and machine learning purposes, then a data lakehouse is a good option.