Search code examples
cloudbenchmarkingmicrobenchmarkvoltdbycsb

YCSB for VoltDB


Does anyone know is there any implementation of YCSB client/driver available for benchmarking VoltDB? Or even any reference publications/blog/article/research project?

Can we use TPC workloads for VoltDB benchmarking?

Thanks a lot everyone.


Solution

  • VoltDB developer here.

    There is no official YCSB driver although several users have done benchmarking using the YCSB framework. There is a bit of an impedance mismatch between YCSB and VoltDB. YCSB is designed to work with range sharded column stores. VoltDB is a hash sharded relational store with rich support for server side logic.

    This manifests as a problem in three ways.

    The first is that YCSB requires range scans. You can do efficient ranges scans in a hash sharded store if you have some knowledge of the key distribution and can normalize keys so they bucket usefully. Here is an example of how you would do it in Cassandra.

    It's not insurmountable, but it requires some thought.

    The second problem is that the column store model doesn't map well to the relational data model. I can gain quite a bit of speed and memory efficiency by packing small maps into a single row with a blob and rewriting it when k/v pairs are added/updated. That is how Redis handles small maps.

    For larger keys with many/larger k/v pairs it makes sense to denormalize and allow the database to manage the memory. With a little work you could make a stored procedure API that does this transparently.

    Again it's not insurmountable, but it isn't trivial either.

    The third problem is that YCSB is written under the assumption that all logic exists on the client and that the server will have to materialize all the data for the client. This means that your real world application written against VoltDB could be several times faster and more space efficient. Faster because server side logic can eliminate several round trips to the client and more space efficient because support for transactions allows you to avoid writing your application in a log structured fashion.

    YCSB will give you a generic sense of how VoltDB performs and scales, but there are non-trivial gains to be had by writing your application in a manner that is appropriate for the relational data model and Volt's emphasis on server side logic.

    Regarding TPC-C. VoltDB was built specifically for a TPC-C like benchmark. I say "like" because it isn't official and it differs from TPC-C in a few ways. The most significant difference is that new order transactions only use a single warehouse (and not the required 1-10 warehouses for some % of new orders). This is significant because it allows the benchmark to shard perfectly without any distributed transactions.

    The VoltDB TPC-C like benchmark doesn't ship with the distribution but is available on github.