Search code examples
replicationdistributedshardingarangodb

In ArangoDB, can we do geo distributed sharding


The use case is that we have customers in US, EU and China. Due to legislation, EU customers data should not be stored outside EU, US customers data should not be stored in China. Due to performance, data should be as close to the customer as possible, and should be replicated/redundant as much as possible. So the plan is split the customers data into 3 shards, customer_us, customer_eu, customer_cn, and have 3 data-center, California, Beijing and Geneva, such that:

  • Beijing holds customer_cn
  • California holds customer_cn and customer_us
  • Geneva holds customer_cn, customer_us and customer_eu

Now the whole example is taken from http://orientdb.com/docs/2.1/Distributed-Sharding.html , but OrientDB has the concept of Class and Inheritance. AFAIK, ArangoDB shards by shard key. My question is

  1. Can we do the same sharding using ArangoDB shard key, and how do we configure sharding and replication as such?
  2. If we have associated data, say "invoice", such that each invoice must belong to one and only one customer, can we automatically, somehow, distributed them in the same manner without taking care of shard key?

Solution

  • ArangoDB currently doesn't offer a datacenter awareness or a zone concept. Currently only a disconnected setup eventually with replication may get you partially to the point you want to get.

    You would create several databases, like db_cn, etc, and replicate them to the different data centers. The replication slave will give you a read only copy.

    We may see an implementation of this in 2016. Since such a feature is pretty special (and latency to access data from another datacenter may be very high), whats your usecase? Would you like to contact us at hackers at arangodb.com ?