Search code examples
mysqlsqlquery-optimizationdatabase-performance

MySQL: index used only when I select only primary key


I have this table in MySQL:

Table: city
Columns: 
  ID int(11) AI PK 
  Name char(35) 
  CountryCode char(3) 
  District char(20) 
  Population int(11)

I want to use index when order by CountryCode so I created index on this column:

Index: CountryCode
Definition:
  Type BTREE 
  Unique No 
  Columns CountryCode

Why EXPLAIN EXTENDED SELECT * FROM city ORDER BY CountryCode; doesn't use index:

| id: 1 
| select type: SIMPLE 
| table: city 
| partitions: NULL
| type: ALL 
| possible_keys: NULL
| key: NULL
| key_len: NULL
| ref: NULL 
| rows: 4188 
| filtered: 100.00 
| Extra: Using filesort

and EXPLAIN EXTENDED SELECT id FROM world.city ORDER BY CountryCode; use index:

| id: 1
| select type: SIMPLE
| table: city
| partitions: NULL
| type: index
| possible_keys: NULL
| key: CountryCode
| key_len: 3
| ref: null
| rows: 4188
| filtered: 100.00
| Extra: Using index

How can I change this behaviour - that is add index usage in first example?


Solution

  • Not a problem. The first query runs faster by not using the index.

    First, let's dissect the second query. Apparently you are using InnoDB. That means that INDEX(country_code) is really INDEX(country_code, id). That is, the PRIMARY KEY is implicitly tacked onto any secondary key. The second query only needs country_code and id, so it can be performed entirely in the index, as is indicated by Using index. It read the index, in order, and delivered the results. Very efficient.

    Now, let's look at the first query. This time you asked for *, not just id. The rest of the fields are not in the index. If it were to use the index (and you can test this with ... FROM city FORCE INDEX(CountryCode) ...) it would have to do this:

    1. Scan through the index -- this delivers the ids in order, but not the rest of *.
    2. For each id, reach into the data for the rest of *. This is costly, especially if the table cannot be fully cached.

    If, instead, it did a "table scan" (which is what you are seeing), the code works like this:

    1. Read the entire table into a tmp table. (This will be MEMORY or MyISAM depending several criteria.)
    2. Sort that tmp table (either in RAM on on disk).

    The table scan + sort will probably be faster.

    Other notes

    If your cities are in the US, "Los Ranchos de Albuquerque", NM, is too long for the field. For the world, consider the 87-char "Imeni 50-letiya (Pyat'desyatiletiya) Kazakhskoy Sovetskoy Sotsialisticheskoy Respubliki" in KZ.

    Do not use CHAR for variable length strings (city, district); use VARCHAR; it will be more efficient in space and, hence, in speed.

    Use CHAR(3) for the fixed-length country_code, but be sure to specify CHARACTER SET ascii. If you use the default of utf8, it will take 9 bytes, not 3.