Search code examples
cloudpersistencedatastoreredundancydata-recovery

Looking for solution to cloud data storage persistence issues


I'm looking for a cloud data storage service which offers the following:

  • Data is stored in duplicate (or more)
  • Data is identical between original and duplicate(s) at all times (i.e. sync original->duplicate is instant or data storage requests don't return until all instances have returned successfully)
  • If the original fails, we should be able to use the duplicate(s) as if it were the original

So my specific question is:

  • Does such a storage solution exist? If so, where can we find it?
  • And if not, are there any best practices for handling the duplicate instance missing data from the original in code?

Many cloud services offer some form of persistence and replication, but there is usually a delay or synchronisation moment between the instances, which in many cases can lead to the duplicate not containing all of the data of the original. This is often in the order of a few seconds to a few minutes, but even such a small time-frame can be quite significant. We're looking to eliminate this delay entirely.

Background:
Currently I'm working on a matchmaking system for an online game. This system must be very reliable, and must have as little downtime as possible. So far our setup has been to use any number of servers, and have them all connect to the same storage unit so they can all work with the same dataset. Specifically, currently our servers are Azure Webroles, and our storage unit is an Azure Redis cache. However, Redis suffers the same issue as described above (delay of ~1s), so we're looking for any alternatives.


Solution

  • There is a pretty extensive article on this hosted on the Redis site:

    Redis Persistence

    We personally use a combination of RDB and AOF on our servers the benefit of this is that any write operations are recorded alongside smaller snapshots written to disk as it goes, this is great for backing up data, the downside is that more storage space is required and there is a small performance hit depending on how you implement AOF. There is an "everysec" option which flushes the buffer for this every second and is a good balance between speed and integrity.