I am trying to get familiar with MySQL Cluster Community Server (version:5.6.27-ndb-7.4.8-cluster-gpl) and I have the first question that puzzles me. I searched through the documentation and the forum but didn't find something relevant.
I have a very simple table on a cluster with 4 data nodes/partitions:
CREATE TABLE customer (
id int(10) NOT NULL ,
surname varchar(35) NOT NULL,
gender varchar(6) NOT null,
primary key(id, surname, gender)
)ENGINE=NDBCLUSTER DEFAULT CHARSET=latin1 PARTITION by key (gender);
I therefore choose to have a partition by key on gender (it takes the values Male/Female). I insert 1000 rows and I want to see how they are distributed:
SELECT partition_name, table_rows
FROM information_schema.PARTITIONS
WHERE table_name='customer';
Results:
'p0', '0'
'p1', '1000'
'p2', '0'
'p3', '0'
So all the rows go into a single partition.
However if I define gender as nvarchar(6) or varchar(40) the rows are distributed like I would expect in two partitions
'p0', '493'
'p1', '0'
'p2', '507'
'p3', '0'
If I raise gender to varchar(60), all records go into a single partition. If I raise it even more to varchar(100), the records are distributed evenly between two partitions.
Is there any logic behind this or am I doing something completely wrong?
The doc says,
Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. This internal hashing function is based on the same algorithm as PASSWORD().
MySQL is running your two values (Male
and Female
) for that column through an arbitrary, and to you the implementer unpredictable, hash function. Under some circumstances that hash function yields the same output for those two values, and under some other circumstances it yields different values. So, sometimes all your rows end up in a particular partition, and sometimes they end up in two partitions.
A column like the one you've chosen, with a low number of distinct values, isn't (said Obvious Man) a great choice for hash or key partitioning. Range partitioning might be more suitable.