Search code examples
mysqlsqlindexingb-treeclustered-index

Where clustered and unclustered index of B+ tree are saved?


Currently I am reading about B+ Tree basics, and got confused regarding space allocation for clustered and unclustered index.

When we create clustered index on B+ tree, the index get stored in the main memory and the leaves contain the data pointers to actual blocks. The blocks are stored in disks, and the blocks contain record.

  • Usually the clustered index is created on primary key
  • There can be only one clustered index

Now suppose we have a table (id, name, class) and I created two unclustered indexes on name and class. My doubt is where will the unclustered index will get stored? and how searching will be performed for a query like

select id, name, class from table where id = 3, name='Leo' and class='10'

enter image description here

My assumption:

  • Since id field is primary key so first using clustered index will id = 3
  • Now using the unclustered index on name and class, we will find the remaining fields

Do you think my assumption is right? Could you elaborate more regarding storing the clustered index? Does the both the index (clustered and non clustered form a n-ary tree? ). I am not able to visualize both the clustered and unclustered index together.


Solution

  • I am speaking specifically about InnoDB...

    The PRIMARY KEY is (as you say) clustered with the data. The entire BTree for that (data + PK) is stored in one set of blocks on disk (not 'main memory'). The 'leaf' nodes contain all the columns.

    A secondary key is a separate BTree. Structurally the two BTrees are the same with the exception of what is in the leaf nodes. For a secondary key, a copy of the PRIMARY KEY is put into to the leaf nodes. Hence, when looking up one row ("point query") using a secondary index, there are two BTree drill-downs - one for the secondary index, and a secondary for the PK.

    All blocks are 'cached' in the 'buffer_pool', so they are sometimes in main memory, but are always persisted (sooner or later) on disk. (The transaction log, etc) assure that "later" does not violate the rule that the data always persists.)

    Your two pictures are a nice start. However...

    • The non-leaf nodes are linked together (as you show), but they are not necessarily adjacent on disk. When inserting a new row (or new index entry), a block could 'split' because of being full. This leads to the blocks being scattered around the disk.
    • The leaf nodes are also linked together, but could be scattered.
    • For Unclustered, well, suggest you start over, taking into account the PK issue, etc.

    What you need to know at a higher level than the pictures are trying to convey:

    • A point query drills down a btree
    • A secondary lookup has to do 2 drill downs
    • "Range" scans -- either of the data, or of an index -- are very efficient because they scan through one block and then move on to the (logically) next block via a bidirectional link between blocks at the bottom level. Hence, it is really a B+Tree, not just a BTree.
    • (more on range) WHERE clustered_key BETWEEN ... is very efficient
    • (more on range) WHERE secondary_key BETWEEN ... is very efficient at finding the PK values it needs, but then turns into a bunch of (potentially) random point queries.
    • All blocks are pretty much equivalent for caching. But (obviously?) the non-leaf nodes tend to live in the cache because of "least recently used" algorithm. (I am leaving out a lot of details.)
    • There can be only one Clustered index. (Unless you are willing to duplicate all the data. This has been done in a couple of Engines other than InnoDB.)
    • A block contains as many 'records' (data or index or non-leaf) as is practical at them moment -- anywhere from 1 to hundreds.
    • By default, a block is 16KB. (And not easy to change.)
    • With innodb_file_per_table=ON, all BTrees for a given table live in a single .ibd 'tablespace'.
    • With innodb_file_per_table=OFF, all BTrees for all tables live in a single global 'tablespace' called ibdata1. (Again, over-simplified.)

    Now for MyISAM:

    • The data for one table lives in a file (.MYD).
    • All the indexes (including the PRIMARY KEY) for one table live in a file (.MYI)
    • All the indexes are BTrees. (The data is not.)
    • The leaf nodes of an index 'point' into the data file.
    • Index blocks are 1KB.
    • The data file is simply a random-access stream.

    (There are a lot more details.)