Search code examples
databaseindexingshardingarangodb

Adding dynamically a sharding key in ArangoDB


I'm installing a clustered db with ArangoDB. I need use indexes in collections.

We suppose that we have one collection named myCollection that was created with the shard keys _key.

Let myVariable be the unique key of myCollection so I have a unique constraint on myVariable.

By myCollection is created, and data are inside.

I don't want erase all, create myCollection again and add a new shard key with myVariable and restore myCollection, so I need to add a new shard key dinamically meanwhile that myCollection is already created.

Is this possible? Can I add, somehow, new shard key?

I mean, add key in _shardBy label without recreate collection.

Thanks for help.


Solution

  • No, changing the shard key after creation is not supported. If you take a look at the consequences this would have, its easily understandeable why:

    The shard key identifies to the coordinator which documents should end on which cluster node. Vice versa it can therefore predict where to search for documents based on the shard key. This assumption would fail if you change that condition to an arbirtary new one. Therefore documents not matching the condition would have to be moved to the correct new shard.

    As you see, you need to work with all documents anyways. So if you don't want to download all data to the client, some javascript on the coordinator like a Foxx Service could fill the gap:

    • create the new collection with the proper shard key
    • fetch all _keys into memory
    • issue repetive AQL queries that select a range from the old collection and insert it into the new one.

    You may want to start an additional coordinator if you don't want to use your existing setup for this.

    Hint: An upgrade to ArangoDB 3.0 will require a dump/restore cycle anyways - so if you can postpone your problem a little you may solve it then.