How are duplicate keys handled in InnoDB's implementation of B+ tree for it's indexes.
For example, if there is a table with 1 million rows having a column with cardinality of 10. If we create an index on this column, how will the resulting B+ tree would look like?
Will it just have 10 keys and the value of each key is the list of primary keys which belong to that key (if yes, in what structure? Linked list?) or will it have 1M keys (if yes, then B+ tree would have to be handled differently)?
In some sense, an InnoDB BTree has no duplicates. This is because the columns of the PRIMARY KEY
are appended to the columns specified for a secondary key. That leads to a fully-ordered list.
When you lookup via a secondary key (or the initial part of a key), the query will drill down the BTree to find the first row in the index matching what you gave, then scan forward to get any others. To get the rest of the columns, it takes the PRIMARY KEY
columns to do a second BTree lookup.
The Optimizer will rarely use an index with "low cardinality". For example, a yes/no or true/false or male/female column should not be indexed. The Optimizer would find it faster to simply scan the table rather than bounce back and forth between the index and (via the PK columns) the main BTree.
The cutoff for when to use the index versus punting is somewhere around 20%, depending on the phase of the moon.