I know if a table is too big, the indexes can hardly be fit into the buffer_pool, so using index may result in a large number of random disk IO. So the full table scan, in general, is probably much faster than index scan even though it only reads about %1 rows.
What I am confused about is :
[0] If there are a big table( 30 millions rows),and many small tables(each table can be fit into memory(buffer)),
will the big table also affect query about small tables ?
My logic is <======>
the buffer is shared by the whole database, so the big table will take most of buffer.
So the indexes of small tables can also hardly be fit into buffer(or it's often
removed from the buffer). Then the above conclusion(full table scan vs index scan) can be applied to this case .
[1] When the big table are partitioned into may small tables(in just one machine), the situation of buffer should keep identical.
So such partition cannot solve this problem(full table scan vs index scan), right? so the "big table" should not mean "one big table", but the "huge database or the sum of data is large"
To sum up, is my inclusion right? if wrong, why? Please give me a hint. Thanks very much.
The buffer_pool
is shared across all tables, data and index. But the rest of what you said is needs to focus on "blocks" instead of "tables".
Caching is performed on a block basis. A block (in InnoDB) is 16KB. Most of the innodb_buffer_pool_size
is dedicated to data and index blocks.
The cache is run (approximately) as LRU (Least Recently Used) -- That is, the least recently used blocks are tossed from the cache when other blocks are needed.
No, a table or index is not "entirely" loaded into the cache. Instead, the desired blocks are loaded (and purged) when needed.
If all the data and indexes fit into the cache, then (eventually) all the blocks will 'live' there.
If the data plus indexes are too big, then blocks will come and go as needed. Usually this is nearly as good as having them all loaded. For example, if you are usually using "recent" records, then the blocks containing them will 'stay' in the cache; meanwhile "old" blocks will get bumped out.
If you are using UUIDs (GUIDs), performance can get really bad -- this is because of the random nature of such indexed values.
Full table scans (and full index scans) should be avoided whether or not things are too big to fit in cache. They are costly, and they can usually be avoided by proper indexing and/or query formulation.
When you do a full table scan on a table that is bigger than the cache, something's gotta give. You will have to do some I/O, and some blocks will be bumped out of cache. However, there is a technique built in that prevents blindly purging the entire cache for an occasional table scan. For further discussion, research innodb_old_blocks_pct
. (No, I don't recommend changing it from the default 37%.)
What do you mean by partitioning a table? If you mean the builtin PARTITION
mechanism, then so what? If you scan a table you are scanning all the partitions. Same number blocks; same impact on the cache.
I have dealt with sets of tables that exceed the buffer_pool by a factor of 10 or more. I can discuss performance techniques, but I need a specific SHOW CREATE TABLE
(with or without PARTITIONs
) and some of the naughty queries (such as table scans).
The Optimizer chooses between doing a table scan and using an index based on a variety of statistics, etc. A Rule of Thumb is that, if more than 20% of the rows need to be touched, it will do a table scan instead of bouncing between the index and the data. (Note: the cutoff is much higher than the 1% you mentioned.)
An Index is structured as a BTree
in 16KB blocks, so it is very efficient to start in the middle and scan a range. For example: INDEX(last_name)
for WHERE last_name LIKE 'J%'
would probably do a "range scan" of 10% of the index, even if that involved bouncing over to the table a lot.