Search code examples
algorithmrelational-databasedatabase-partitioningconsistent-hashing

Is relational database able to leverage the way of consistent hashing to do the partition table?


Assume we have a user table to be partitioned by user id as integer 1,2,3...n . Can I use the way of consistent hashing used to partition the table?

The benefit would be if the number of partitions is increased or decreased, old index can be the same.

Question A.

Is it a good idea to use consistent hashing algorithm to do the partition table?

Question B.

Any relational database has this built in supported?

I guess some nosql database already use it.

But database here refer to relational database.

I just encountered this question in an interview. In the first reaction I just answered mod by length, but then be challenged if partitioning the table into more pieces, then it will cause problems.


Solution

  • About oracle hash partition, part from oracle help doc

    After some research, oracle actually do support consistent hashing by the default hash partitioning. Though how it did is a secret and not published. But it actually leverage the way HashMap, but hidden some partitions. So when you add/remove partition, very less work for oracle to adjust the data in different partitions. The algorithms only ensures evenly splitting data into partitions of numbers power of 2 such as 4. So if it's not, then merge/split some partitions.

    The magic is like if to increase from four partitions to five, it actually spilts one partition into two. If to decrease from four partitions into three, it actually merges two partitions into one.

    If anyone has more insight, add a more detailed answer.

    Hash Partitioning Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to the partitioning key that you identify. The hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size.

    Hash partitioning is the ideal method for distributing data evenly across devices. Hash partitioning is also an easy-to-use alternative to range partitioning, especially when the data to be partitioned is not historical or has no obvious partitioning key.

    Note:

    You cannot change the hashing algorithms used by partitioning.

    About MYSQL hash partition, part from mysql help doc

    It provides two partition function One is partition by HASH. The other is partition by KEY.

    Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. MySQL Cluster uses MD5() for this purpose; for tables using other storage engines, the server employs its own internal hashing function which is based on the same algorithm as PASSWORD(). The syntax rules for CREATE TABLE ... PARTITION BY KEY are similar to those for creating a table that is partitioned by hash.

    The major differences are listed here:

    •KEY is used rather than HASH.

    •KEY takes only a list of one or more column names. Beginning with MySQL 5.1.5, the column or columns used as the partitioning key must comprise part or all of the table's primary key, if the table has one.

    CREATE TABLE k1 (
        id INT NOT NULL PRIMARY KEY,
        name VARCHAR(20)
    )
    PARTITION BY KEY()
    PARTITIONS 2;
    

    If there is no primary key but there is a unique key, then the unique key is used for the partitioning key:

    CREATE TABLE k1 (
        id INT NOT NULL,
        name VARCHAR(20),
        UNIQUE KEY (id)
    )
    PARTITION BY KEY()
    PARTITIONS 2;
    

    However, if the unique key column were not defined as NOT NULL, then the previous statement would fail.

    But it don't tell how it partitions, will have to look into code.