MemSQL and H2 comparison

I am considering moving from H2 to MemSQL - and I would greatly appreciate any comments:

My application has to query very quickly concurrently from large tables of up to 300Million rows. To achieve this I have been using the H2 in-memory database.

I'm currently using the H2 database which allows me to create linked tables in the H2 in-memory database that point to a MySQL database. This is very useful in loading data from MySQL to H2.

Can I create Linked tables in MemSQL - I see no references to this in the online MemSQL documentation?

Another challenge is that I will need to run multiple instances of the application across many servers, so having MemSQL running distributed across servers is very attractive rather than having to duplicate the H2 database in every JVM instance of the application across the servers. Running one instance of H2 via TCP to the other servers will be too slow.

The other advantage I see with MemSQL is that there is apparently no locking and the queries are compiled into native C++ which could speed them up.

Has anyone compared MemSQL performance with H2? - I've found nothing on line from real world tests.

Solution

Mark L here from MemSQL. I wanted to address a few of your questions and offer additional help in getting the info/benchmarks you're asking about.

MemSQL does support linked tables via the JDBC connector - which in practice works just as it would with MySQL - so you'll have no issues getting that to work. Running MemSQL in distributed mode is indeed going to provide a big performance advantage and you'll see some significant improvements across the board both on throughput and latency. There's no direct comparison that I've found directly between H2 and MemSQL - however, you can draw some indirect conclusions by looking at comparisons of MemSQL vs MySQL since we have the comparison data for H2 vs. MySQL from the website. From our field experience I would expect you to observe significant performance gains when using MemSQL.

In general a few observations: in the MemSQL distributed version you would have several advantages that you can't get from H2: reads never blocking writes thanks to lock-free indexes, full MVCC (H2 can only do this in single-box), and auto sharding of data being among the highlights. Out of all the features, auto-sharding is likely to be the most substantial for your use case - H2 can't auto-shard the data, and having that ability when distributed is obviously a big advantage even if speed were equal between the two. As I mentioned though it will be much faster with MemSQL distributed, as well as easier to manage vs. multiple instances of H2.

In any case we're more than happy to help you prove this out! Please feel free to reach out to me via email- larosa at memsql dot com.