Search code examples
databasedatabase-designriakriak-kv

Can i add new nodes and increment n_val in Riak kv?


I intend to make a desktop app for a company that has multiple gyms at different locations,where a client who registered his membership at one gym should be able to access all other gyms with the same membership. That means creating/updating/deleting a client data in one gym must propagate to all other facilities. I think about using the Riak kv DB with the following properties:

  • number of nodes is equal to the number of gyms (one node in each gyms computer), so the data is replicated across all nodes.
  • n_val is equal to the number of nodes
  • r = w = 1 , since a gym might not have internet access at all times, so data updates can happen in at least one node.

Questions:

  1. Is this the right way to do it, if no, what database (type) should i use?
  2. When a new gym facility is created, is it possible to add it as a new node to the existing cluster, and thus incrementing the n_val by one, although the Riak docs advises against modifying the n_val of a bucket after its initial creation?.
  3. Since the gyms computers will be using an ISP internet access with no fixed public IP addresses, is it a good idea to connect them using a virtual private local network with something like hamachi for example?

Solution

  • That's a good set of questions. Before I answer them, I should probably mention that most people would approach this with a centralised system either somewhere online or hosted at the head office. This would be the definitive system for everything and all gyms would simply be clients. This way, any changes in member status etc. would only need to update the centralised system once and all future reads of that data would have the latest version.

    That said, the above mentioned system has multiple points of failure:

    a. If the centralised system goes down e.g. internet failure, hardware issues etc. then the entire gym network goes down.

    b. If the internet connection of an individual gym goes down then that gym cannot query or update the centralised system.

    This centralised method could be done with Riak or any other NoSQL database. To be fair, an SQL database such as MariaDB or Postgres should be able to handle the details provided in the scenario as well.

    Now we're through the boring bit, let's have some fun with Riak!

    1. Is this the right way to do it, if no, what database (type) should i use?

    Using Riak distributed between different sites is a great way to take advantage of Riak's design. However, the approach you suggest is slightly flawed. Although Riak is an awesome database, it needs very low latency times (read ping times) on the network between nodes in the same cluster. This is how it handles failed nodes seamlessly in the background - it has the ability to shift around data to create temporary vnodes on the remaining physical nodes that will handle the work of the failed nodes until they are restored.

    Unless you can afford dark fibre laid between each and every gym with all gyms located within 3 km (2 miles) of each other, let's look at one cluster per gym and then connect the gyms using Multi-DataCentre replication (MDC).

    With this approach, you would set up 1 Riak cluster per gym. In an ideal world, you would have 5 or more nodes per cluster per gym. However, given your use case, you could probably get away with 3 nodes per cluster per gym or, in a worse case scenario, 1 node per cluster per gym. Yes, you read that right, even a single Riak node can count as a Riak cluster.

    Once you have all your nodes set up, you would then use MDC to keep the data between clusters in sync. You would run realtime repl to sync transactions in real time (well, practically real time - a max of second or so lag including communicating between clusters) and fullsync which looks at the database on either side of MDC and then copies across any data present on one but missing on the other.

    You may ask why you would need fullsync if you have realtime sync. Fullsync makes sure that everything is in sync. If there is a hiccup in your internet connection or somebody reboots your entire cluster at the same time then there will be a brief interruption to realtime sync processing and data from one end will not make it to the other for that period. Fullsync covers eventualities such as this.

    Depending on the number of gyms you have, you would need to decide whether you want to use a ring-based replication method (A->B->C->D->E->A) where data cascades around the ring or a star based replication (all clusters connect to all other clusters). The former method is slower but scales better. The latter method is much faster but can become unsustainable at scale depending on how many sites and how much data is pumped through. If it's a few hundred transactions per day then the star based method may work better for you but if it's hundreds of thousands or millions of transactions, you would probably want to look at ring-based replication if you have more than 5 sites.

    1. When a new gym facility is created, is it possible to add it as a new node to the existing cluster, and thus incrementing the n_val by one, although the Riak docs advises against modifying the n_val of a bucket after its initial creation?

    Given the explanation above, n_val is the setting inside a cluster whereas this project should really be approached by using MDC.

    For reference, yes, you can modify the n_val of a bucket after its initial creation but you would then have to run a partition repair on the partition containing the bucket to generate the extra copy. This is not exactly efficient usage of Riak and is generally not recommended.

    1. Since the gyms computers will be using an ISP internet access with no fixed public IP addresses, is it a good idea to connect them using a virtual private local network with something like hamachi for example?

    Yes, when syncing data between Riak clusters, please do it over a VPN. This will have the major benefit of giving your Riak nodes static IPs as well as keeping all your data secure over the internet.

    If you can't use a VPN, you should use SSL to encrypt your MDC connections and use a robust firewall to prevent any other access from the internet as an absolute minimum. As you mention that the connections are not static IPs, you would want to set your Riak clusters up with a Fully Qualified Domain Name (FQDN) and use a dynamic domain name service such as DynDNS to have the IP address automatically updated so that your FQDN always points to the right IP irrespectively of how often it changes.


    Additional Notes:

    I) With your local gym clusters, ideally they should be multiple nodes on separate hardware. That way if one node goes down, the others can continue. If there are budgeting restraints, multiple Virtual Machines or multiple Docker instances are a viable alternative. However, you will want to make sure that each VM or each Docker instance has a physically separate hard disk attached to it. The logic behind this is that Riak can have a lot of I/O per node, especially if one of the nodes in the cluster has just failed. Generally I/O is one of the biggest blocking factors in Riak. If this I/O is multiplied by two or more nodes trying to access the same physical disk (even though it may be in different partitions), the combined slow down effect on Riak is far from desirable. It also means that should a HDD failure occur on one drive, it would not affect the other nodes in the cluster.

    Please note that if you opt for a VM or Docker approach, if the host machine that runs the VMs or Docker goes offline then all the Riak nodes attached to it will also go offline. If worried about this, Riak does work with pretty cheap commodity hardware though. I personally have a 5 node Riak cluster running on Raspberry Pi 4Bs.

    II) Riak is fast because it is severely stripped down compared to an SQL database. For example search functionality in Riak is limited to secondary indexes (2i), map reduce and, the somewhat inefficient key listing. Although 2i and map reduce can be very powerful, it is often better to think ahead about what kind of search functionality you may need and then create a datatype in a set for exactly that function and have your application update it on a write cycle.

    For example, if you wanted to send your gym members a birthday card on their birthday, a key listing on the date of birth field in either a map or across key/value pairs would result in Riak reading every single key in the database and then filtering out the ones you did not ask for. Highly inefficient. 2i would handle this slightly better as it would only look at the keys marked with the given index but it still involves crawling over multiple keys. However, if, as part of your member sign up process, you had your application write to, for example, a CRDT set called birthdays with a datatype for every month/day combination and then the values stored in there being the member numbers with that birthday then you don't even need to search any more. To get all the birthdays for a specific date, you just tell Riak to return birthdays/0131 and it would tell you the member number of all members with birthdays on 31st January. As that is a direct read operation rather than a search operation, it will return faster than any query would. This does require a bit more planning of how your application will interface with Riak but for the minor speed cost of a little duplication on write compared to a significant speed benefit on read it is well worth it.

    III) Remember to run Riak on a different box/VM/Docker instance to your application. Riak competes very badly for system resources compared to other applications. This can cause Riak to run very badly if run, for example, on the same box as a web server. As such, Riak runs best when it has sole access to all system resources.

    IV) For reliability, I would recommend trying to attain a minimum of 3 nodes per cluster and a load balancer between them and your application server. This would mean that in the event of a single node failing, the load balancer should realise this and route all queries to the remaining nodes. In the same way, the remaining nodes would spin up temporary vnodes to handle the work that should have been done by the failed node. In combination, it should mean that even in the event of internet outage and one physical node failing, the gym should still be able to function as if nothing was wrong from the end user point of view.

    V) Again, plan well. NoSQL databases have a lot of advantages over relational databases when it comes to speed, scaling and fault tolerance but it comes at the cost of flexibility. Once you have set everything up in Riak and started populating with user data, it is hard to change populated data from one type to another. As such, plan your setup carefully. Think about what CRDTs you would like to use as they are more flexible to handle values that often change e.g. counters and logging applications. Maybe you have values that change very rarely. In that case you might prefer to use key/value pairs for those as key/value pairs are much faster to read and write. Also consider which data you want to duplicate in what way e.g. the birthdays example from earlier.

    VI) Although planning related again, look carefully at the features offered by the different possible backends for Riak and the two different types of Active Anti-Entropy (AAE). If unsure as to which to use, we generally recommend Leveled as the backend with TicTacAAE as this combination should generally suit all but the most specific of needs.

    If you need packages, go to https://files.tiot.jp/riak/kv/ and then choose the package you need. Despite 3.2.0 sounding like the latest release, it is actually 3.0.16 at time of writing. 3.2.1 should be released later this year.

    If you need reference documentation, https://www.tiot.jp/riak-docs/ has, at time of writing, the most up to date documentation.