Search code examples
mysqlindexingdisk

What's the rate-limiting step for mysql query, index query or disk read?


For the basic mysql query like select * from table where c1=v1 and c2=v2 and c3=v3, I realize the latency of the query is proportional to response payload size, say return 500 rows is slower than 200 rows, which sounds reasonable to me. But my question is what makes 500 rows slower than 200? Is it index searching or disk read?

From my understanding the underlying mysql query steps are first searching index to find offset of disk file (data location), then actually fetching data from disk based on location. (Correct me if I'm wrong).

Then if the table is well-indexed, it should be disk reading that limits speed? But if table is not well-indexed, searching part can be linear and slow down query?


Solution

  • I realize the latency of the query is proportional to response payload size...

    No, this is not correct.

    The latency is closer to proportional to the number of examined rows, which may be quite a larger number than the number of rows returned.

    For example, given your query:

    select * from table where c1=v1 and c2=v2 and c3=v3
    

    If the table has an index on c1, but not on c2 or c3, then at least the index on c1 helps to reduce the examined rows down to those that match c1=v1. Of those examined rows, a lot of them won't satisfy the other conditions, and will be discarded instead of being returned as part of the query result.

    From my understanding the underlying mysql query steps are first searching index to find offset of disk file (data location), then actually fetching data from disk based on location.

    This is roughly true for MyISAM files, but it's not accurate for InnoDB, which has been the default storage engine for MySQL since 2010.

    If you do a search in an InnoDB table and the optimizer uses an index, the value it finds in the index is not an offset to find the record, it's a primary key value. So an index lookup has to do a second search in the clustered index by that primary key value.

    Then if the table is well-indexed, it should be disk reading that limits speed? But if table is not well-indexed, searching part can be linear and slow down query?

    The answer to this is also more complicated with InnoDB, because some or all the pages you need might be in RAM already. Pages are loaded into the InnoDB buffer pool as a result of queries doing searches. The pages don't leave RAM immediately, which is good, because subsequent queries might need to search a similar set of pages.

    This means that over time, the cost of I/O when loading pages from disk is probably amortized over many queries. The more queries that use the same pages in RAM, the more it lowers the average cost per page read.