I was reading ebook chapter about indexes, and indexing strategies, many of these aspects I already know, but I stucked on clustered indexes in InnoDB, here is the quote:
Clustering gives the largest improvement for I/O-bound workloads. If the data fits in memory the order in which it’s accessed doesn’t really matter, so clustering doesn’t give much benefit.
I belive that this is truth, but how am I supposed to guess if the data would fit in memory? How the database decide when to process the data in-memory, and when not?
Let's say we have a table Emp with columns ID, Name, and Phone filled with 100 000 records
If, one example, I will put the clustered index on the ID column, and perform this query
SELECT * FROM Employee;
How do I know if this will use a benefits from clustered index?
It's somehow relative to this thread Difference between In memory databases and disk memory database
but yet I am not sure how the database will behave
Your example might be 20MB.
"In memory" really means "in the InnoDB buffer_pool", whose size is controlled by innodb_buffer_pool_size
, which should be set to about 70% of available RAM.
If your query hits the disk instead of finding everything cached in the buffer_pool, it will run (this is just a Rule of Thumb) 10 times as slow.
What you are saying on "clustered index" is misleading. Let me turn things around...
PRIMARY KEY
.UNIQUE
.id INT UNSIGNED NOT NULL AUTO_INCREMENT
.The real question is not whether something is clustered, but whether it is cached in RAM. (Remember the 10x RoT.)
PRIMARY KEY
or other type of INDEX
.)How the database decide when to process the data in-memory, and when not?
That's 'wrong', too. All processing is in memory. On a block-by-block basis, pieces of the tables and indexes are moved into / out of the buffer_pool. A block (in InnoDB) is 16KB. And the buffer_pool is a "cache" of such blocks.
SELECT * FROM Employee;
is simple, but costly. It operates thus:
Employee
(if not already open -- a different 'cache' handles this).Things get more interesting if you have a WHERE
clause. And then it depends on whether the PK or some other INDEX
is involved.
Etc, etc.