Search code examples
databasedata-warehouse

Implementing a Data Warehouse


I just started learning about Big Data domain, so I would like to know what's the best Database management system to implement a data warehouse beside SQL Server


Solution

  • It is very hard to answer this question since not a lot of information was given. Some questions I would ask to decide this:

    • Big Data can be about Variety. So maybe the question for the right database is not the right one but the question should be: How does the data look like? Is it relational? Is it NoSQL-based, e.g. JSON or XML formed? Is it a mixture of both types? This could result in using just one instance of PostgreSQL or using a mixture "Data Lake" environment with Hadoop components like HDFS / Hive, Spark and for example a MongoDB instance for unstructured NoSQL JSON data.

    • Big Data can be about Velocity. Again, here should be the question: How much data in which amount of time has to be consumed? Must all this data be transactional? Can some information be ignored if the pipeline is not fast enough to consume the data? Is the planned place of the Big Data infrastructure somewhere in the cloud or on premise?

    • Big Data can be about Volume. So, how big an environment has to be planned? How big will the volume of data be now? How big will it be in a year? How big is the growth rate? This can result in decisions not to use licensed tools to avoid the license fee. Also, this leads maybe to a decision if to build up the environment in the cloud or on premise - on premise should also make clear if high availability is a requirement.

    To answer this question it would be required to know a lot more about the planned use case in the future. If you really just want to store relational data, there are some lists about database systems.

    From the top of my head for example:

    • MySQL
    • PostgreSQL