Search code examples
databaseindexingb-tree

How multiple column b-tree index is organized


I want to understand better index organization. Imagine we have a table with 2 columns:

CREATE TABLE user( 
  name varchar(100)
 ,age int)

We would like to create an index:

CREATE INDEX IDX_MultiColIdx on user(name,age)

How would B-Tree index organization look like?

In case of one column, say, age, the organization is clear: every non-leaf node would contain a set of integer keys which would be used for a search. Which values contain nodes of our IDX_MultiColIdx B-Tree index?


Solution

  • Which values contains nodes of our IDX_MultiColIdx B-Tree index?

    Values of name, age and the row pointer (RID/ROWID or clustered key, depending on the table organization), sorted lexicographically.

    How exactly they will be stored, depends on the datatype and database system.

    Usually, CHAR is stored right-padded with spaces up to its size, while VARCHAR is prepended with its length.

    MyISAM and some other engines can use key compression: the matching parts of a set of keys are only stored once, and the other keys only store the differing parts, like this:

    Hamblin
    Hamblin, California
    Hamblin (surname)
    Hambling Baronets
    Hambly
    Hambly Arena    
    Hambly Arena Fire
    Hambo
    Hambo Lama Itigelov
    Hambok
    Hambone
    

    will be stored as:

    Hamblin
    [7], California
    [7] (surname)
    [7]g Baronets
    Hambly
    [6] Arena   
    [6] Arena Fire
    Hambo
    [5] Lama Itigelov
    [5]k
    [5]ne
    

    , where [x] means "take leading x characters from the previous key"