Search code examples
cassandrareplication

Can Cassandra be used to both replicate, provide a 'master' and filter data at sites?


I'm researching tech for a new project.

We have a number of sites, whose data should come from a master server. Each site can only have the data on it that is relevant to the owner (company) of the site. Each site may have a number of independent machines, each with the own DB.

The master would have all data for all sites/machines. Offline usage with sporadic connectivity is expected.

I can use something like symmetric-ds to do this with an RDBMS. e.g: setup per-site replication such that each site only receives data relevant to that site.

What that doesnt get me (at least, automatically) is the ability for a write to one of the local machines, to be automatically replicated to the other machines that are at that site. This is important in the offline case, where writing to machine A on a site won't automatically cause the same write/update to occur at machine B of the same site. The write to B will occur naturally as a part of symmetric-ds replication when the connection comes up, but I need something that'll work locally when the connection is down.

I'm wondering if something like Cassanda is suitable for this?

I'm thinking:

  • Each site is a set of replicas for itself (even if that site it just one machine)
  • Each site replicates data to a master site
  • Writes at the master are replicated to relevant sites (based on the content of the data)

Solution

  • Out of the box there is no such functionality in Cassandra. Theoretically you may build something like you need using the DSE's Advanced Replication functionality, but this requires experimentation to build correct solution. Although if decision is based only on content, then maybe this won't work as well.

    Default cross-DC replication in Cassandra may sustain downtime, especially if your software is using correct consistency levels when writing/reading data. But all data will be replicated to all DCs, until you separate them into different keyspaces that all will have master as one DC, and inidividual DCs as 2nd DC.