Search code examples
nuodb

Architectural challenge in NuoDB addressed?


Refer to this video: https://youtu.be/NsI51Mo6r3o?t=18m48s

The video was dated in Sept. 2013. In technology term it is quite outdated. However, in the video it raised several challenges NuoDB had. I wonder did NuoDB improve on the aspects of:

  1. Race condition in the Join process. If nodes are joined in the wrong order, they'll end up in the quiet split-brain mode, and will lose data if rejoined later.
  2. Race conditions in database creation / schema operation
  3. Tricky to configure and start the system in an automated way
  4. When a node crashes, it does not bring back storage manager or transaction manager, means data can all of a sudden become less durable as you could have only 1 or 0 copy of the data.
  5. During a partition, transactions are blocked due to cpu/storage hauling resource

Solution

  • Yeah, that was a while ago – but it was very helpful to our engineering team then. We did a lot of work to replicate those tests – and fix the problems they exposed. It’s all written up in a series of blog posts. The best place to start is here:

    http://dev.nuodb.com/techblog/network-failure-handling-roundup

    It’s the umbrella post for the others that build up to the full response

    This next post was added a little later so it’s not linked in the above series, but it is still relevant:

    http://dev.nuodb.com/techblog/testing-network-failure-aws

    And with specific regard to your fourth point, about restarting crashed processes, NuoDB now has the concept of a Managed Database; that just means it has a defined SLA it will adhere to automatically - from Single Host, through Minimally Redundant and Multi-Host to Geo-Distributed. That means the database will restart or replace lost processes automatically to continue to meet its SLA. And you can change the SLA while the database is running.