Search code examples
mysqlsqldatabaseperformancedatabase-performance

Is full table scan needed for counting rows with some attribute bigger than x?


Let's say there a table of people, with an age column which is indexed. How fast would be a query to count people older than 20: SELECT COUNT(*) FROM people WHERE age > 20? Is full table scan required? The database is MySQL.


Solution

  • if the column age is not indexed, then yes, a full table scan is required. Even if it is indexed, if the data distribution of age values is such that there are more than a certain threshold percentage of the records that have age > 20, then a table scan is required anyway. it works this way, for each row that would be returned by the query, the processor must execute n disk IO operations, where n is the number of levels in the index... If there are, say a million rows in the table, and the index on age is 5 levels deep, then if there are more than 200k rows with age value > 20 then for each of those rows the processor has to execute 5 I/Os, for a total of 200k * 5 = 1 million I/Os, so, the optimizer says, if my statistics indicate that more than 200k rows would be returned, I might as well do a complete table scan, that will require less than 1 Million I/Os.

    The only exception to this is if the entire table is clustered on the age column, then you only need to traverse the index for the boundaries of the age range you want to filter on.