Search code examples
warehouse

Which is the best free data warehouse products


I am developing a system which constains a lot of olap work. According to my research, column based data warehouse is the best choice. But I am puzzled to choose a good data warehouse product.

  1. All the data warehouse comparison article I see is befor 2012,and there seems little article about it. Is data warehouse out-of-date? Hadoop HBase is better?

  2. As far as I know, InfiniDB is a high performance open source data warehouse product, but it has not been maintained for 2 years https://github.com/infinidb/infinidb. And there is little document about InfiniDB . Has InfiniDB been abundanted by developers ?

  3. Which is the best data warehouse product by now?

  4. How do I incrementally move my Business data stored in the Mysql database to data warehouse ?

Thank you for your answer!


Solution

    1. Data warehousing is still a hot topic, and HBase is not the fastes, but a very well known and compatible one (many applications build on it)

    2. I have taken the Journey for a good Column store some years ago and finally went with InfiniDB because of the easy migration from plain mysql. its a nice piece of software, but it has still bugs, so i cannot fully recommend it to be used in production. (not without a 2nd failover instance). However, MariaDB has picket up the InfiniDB technology and is porting it over to their MariaDB Database Server. This new product ist called MariaDB Columnstore[1], of with a testing build is available. They have already put a lot effort in it, so i think ColumnStore will get a Major product of MariaDB within the next two years.

    3. I cant answer that. Im still with InfiniDB and also helping others with their projects.

    4. This totally depends on your data structure and usage.

    InfiniDB is great at querying, it had (in my tests) ~8% better performance than impala, however, while infinidb supports INSERT, UPDATE, DELETE and transactions it is not great on transactional workload. i.e. just moving a community driven website to infinidb where visitors always manipulating data will NOT work well. one insert with 10000 rows will work well, 10000 inserts with 1 row will kill it.

    We deployed Infinidb for our customers to 'aid' the query performance of a regular mariadb installation - we created a tool that imports and updates MariaDB database tables into InfiniDB faster querying. manipulations on that table are still done in MairaDB and the changes get batch-imported into InfiniDB with 30 sec delay. as original and infinidb tables have the same structure and are accessable with api mysql, we just can switch the database connection and have super-fast SELECT queries. this works well for our use case.

    We also built new statistics/analytics applications from ground up to work with infinidb and replace a older MySQL-Based System, which also works great and above any performance-expectations. (we now have 15x of the data we had in mariadb, and its still easier to maintain and much faster to query).

    [1] https://mariadb.com/products/mariadb-columnstore