Search code examples
partitioningshardingcrate

Crate database - relation between shards and partitions and nodes


I am new to crate database and trying to understand relation between shards, partitions and nodes.

  1. How many partitions corresponds to one shard?
  2. Can I configure to send data of a particular partition to be part of particular shard?
  3. Can I configure to send a particular shard to particular node ?
  4. Can I move a shard from one node to another in crate cluster as it can be done in elasticsearch?
  5. Can I have different number of replicas per shard ?

Usecase is to keep the latest data on few of my best performing nodes with more replicas and older data on not so good hardware with 0 or just 1 replica.


Solution

  • Shards are the smallest "unit of data" Crate has and a table should have an appropriate amount (not an exact science, I know) to distribute the data (and by extent the workload) evenly across the cluster, since this is done within those shards. Currently there is no direct control over placement (on which node) and replication of particular shards.

    How many partitions corresponds to one shard?

    Actually it's the other way around: A partition has a few shards, since a partition is treated like a "sub-table" with a subset of data in it. A partition is created from the original CREATE TABLE statement (it is used as a template) and can therefore even have a different shard count than other partitions.

    Can I configure to send data of a particular partition to be part of particular shard?

    Well, no - not explicitly. The shard management is handled in the background by a magic algorithm :) Controlling the partition a row resides in is as simple as updating the partition column's value.

    Can I configure to send a particular shard to particular node ?

    No. There are knobs in the configuration to control the # of shards on a node in general: https://crate.io/docs/reference/configuration.html#allocation but it's not recommended to change these setting unless you know exactly what you are doing ;)

    Can I move a shard from one node to another in crate cluster as it can be done in elasticsearch?

    No, not explicitly.

    Can I have different number of replicas per shard ?

    No, replicas are a per-table setting and the whole table will be affected.

    Usecase is to keep the latest data on few of my best performing nodes with more replicas and older data on not so good hardware with 0 or just 1 replica.

    For this use case I would recommend using either a second table (you cannot control on what machine the data is stored though), or - if you don't need to query the data - use your old machines to store the snapshots: https://crate.io/a/backing-up-and-restoring-crate/ and restore it when needed.

    Cheers, Claus