Search code examples
mysqlsqldatabaseoptimizationdatabase-performance

MYSQL Optimizer just ignore the last column which i use to ORDER BY in the composite indexes


I have one table contains about 3 million rows which structure as follow:

CREATE TABLE `profiles3m` (
  `uid` int(10) unsigned NOT NULL,
  `birth_date` date NOT NULL,
  `gender` tinyint(4) NOT NULL DEFAULT '0',
  `country` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'ID',
  `city` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'Makassar',
  `created_at` timestamp NULL DEFAULT NULL,
  `premium` tinyint(4) NOT NULL DEFAULT '0',
  `updated_at` timestamp NULL DEFAULT NULL,
  `latitude` double NOT NULL DEFAULT '0',
  `longitude` double NOT NULL DEFAULT '0',
  `orderid` int(11) NOT NULL,
  PRIMARY KEY (`uid`),
  KEY `idx_composites_latitude_longitude_gender_birth_date_created_at` (`latitude`,`longitude`,`country`,`city`,`gender`,`birth_date`) USING BTREE,
  KEY `idx_composites_country_city_gender_birth_date` (`country`,`city`,`gender`,`birth_date`,`orderid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

I am failed to tell MySQL Optimizer to use all columns in the Composite index definition, seems like the optimizer just ignoring the last column as orderid for ordering purpose which is just a copy of uid column as you might know PRIMARY KEY in InnoDB table cannot use to ordering because it may instruct the optimizer to use the PRIMARY KEY as index rather than using our composite Indexes and that is the idea of the creation of orderid column comes from.

The following SQL query, along with the Explain JSON, plus Show Index statement to show all Index Statistics on the table may help to analysing the caused.

SELECT
    pro.uid 
FROM
    `profiles3m` AS pro 
WHERE
    pro.country = 'INDONESIA' 
    AND pro.city IN ( 'MAKASSAR' ) 
    AND pro.gender = 0 
    AND ( pro.birth_date BETWEEN ( NOW()- INTERVAL 35 YEAR ) AND ( NOW()- INTERVAL 25 YEAR ) ) 
    AND pro.orderid > 0 
ORDER BY
    pro.orderid
LIMIT 30

Explain JSON as follows:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "45278.73"
    },
    "ordering_operation": {
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "19051.43"
      },
      "table": {
        "table_name": "pro",
        "access_type": "range",
        "possible_keys": [
          "idx_composites_country_city_gender_birth_date"
        ],
        "key": "idx_composites_country_city_gender_birth_date",
        "used_key_parts": [
          "country",
          "city",
          "gender",
          "birth_date"
        ],
        "key_length": "488",
        "rows_examined_per_scan": 57160,
        "rows_produced_per_join": 19051,
        "filtered": "33.33",
        "using_index": true,
        "cost_info": {
          "read_cost": "22417.02",
          "eval_cost": "3810.29",
          "prefix_cost": "26227.30",
          "data_read_per_join": "9M"
        },
        "used_columns": [
          "uid",
          "birth_date",
          "gender",
          "country",
          "city",
          "orderid"
        ],
        "attached_condition": "((`restful`.`pro`.`gender` = 0) and (`restful`.`pro`.`country` = 'INDONESIA') and (`restful`.`pro`.`city` = 'MAKASSAR') and (`restful`.`pro`.`birth_date` between <cache>((now() - interval 35 year)) and <cache>((now() - interval 25 year))) and (`restful`.`pro`.`orderid` > 0))"
      }
    }
  }
}

below is for show index statement :

+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Non_unique | Key_name                                                       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 0          | PRIMARY                                                        | 1            | uid         | A         | 2984412     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 1            | latitude    | A         | 2934360     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 2            | longitude   | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 3            | country     | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 4            | city        | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 5            | gender      | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 6            | birth_date  | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 1            | country     | A         | 1           |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 2            | city        | A         | 14          |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 3            | gender      | A         | 29          |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 4            | birth_date  | A         | 362449      |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 5            | orderid     | A         | 2984412     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+

What really interesting to look in Explain JSON, they told us if the optimizer can only use four part of our indexed and not surprisingly ordering operation is using filesort as you know means slower execution which is bad for application performance.

idx_composites_country_city_gender_birth_date (country,city,gender,birth_date,orderid)

"ordering_operation": {
          "using_filesort": true,
.....

"key": "idx_composites_country_city_gender_birth_date",    
"used_key_parts": [
              "country",
              "city",
              "gender",
              "birth_date"
            ],

Do i missed something, is it caused by RANGE clause in our WHERE statement?, i've been tested with different combinations of columns in our Composite index sequence for example i am changing orderid column with premium which is a flag column type which only contain 0 and 1, and it worked MySQL Optimizer can utilising all five columns, then why the Optimizer can't do the same with orderid column? is it having to do with Cardinality? i am not so sure, the only thing i can assure is that i must make the ORDER BY working without any impact to the application performance no matter how to do it.

I've been searching the answer in this couple days, but still cannot resolve it. almost forgot to mention MySQL Version in case it helps.

+------------+
| version()  |
+------------+
| 5.7.29-log |
+------------+

Solution

  • You noticed that it is only using four of the columns of the index:

        "used_key_parts": [
          "country",
          "city",
          "gender",
          "birth_date"
        ],
    

    Despite the conditions in your WHERE clause referencing all five columns:

    WHERE
        pro.country = 'INDONESIA' 
        AND pro.city IN ( 'MAKASSAR' ) 
        AND pro.gender = 0 
        AND ( pro.birth_date BETWEEN ( NOW()- INTERVAL 35 YEAR ) AND ( NOW()- INTERVAL 25 YEAR ) ) 
        AND pro.orderid > 0 
    

    However, there's something different about these conditions. The conditions on country, city, gender are all equality conditions. Once the search finds the subset of the index with those values, then the subset is ordered by birth_date next, and if there are some rows that are tied for birth_date, these are further ordered by orderid.

    Just like if you read a telephone book, and you find all people whose last name is "Smith", they are ordered by first name. If there are multiple people who have the same first name as well, they are ordered in the phone book according to their respective phone number.

    Smith, Sarah 408-555-1234
    Smith, Sarah 408-555-5678
    

    But what if you search for all people with last name Smith and a variety of first names beginning with "S"?

    Smith, Sam   408-555-3298
    Smith, Sarah 408-555-1234
    Smith, Sarah 408-555-5678
    Smith, Stan  408-555-4224
    

    These are not in sorted order by phone number. They sort by last name, then by first name, then by phone number only if they are tied in the preceding columns.

    If you want to get them sorted by phone number, you could create an index with columns in another order, like last name, phone number, first name.

    Smith 408-555-1234 Sarah
    Smith 408-555-2020 David
    Smith 408-555-3298 Sam
    Smith 408-555-4100 Charlie
    Smith 408-555-4224 Stan
    Smith 408-555-5555 Annette
    Smith 408-555-5678 Sarah
    

    Now they are in phone number order, but there are other names among them that don't match your condition for first names beginning with "S". They aren't even in sorted order by first name, because the third column for first name would be sorted only when the first two columns are tied.

    This points out a general problem with indexes: You can reorder the columns only for columns involved in equality comparisons. If you want to sort results, you can use the index only if you sort by a column in the index and all preceding columns of the index are used for equality comparisons only.

    Once you reference one column in a range comparison, any subsequent columns in the index are ignored for both searching and sorting.

    Stated another way: the index can have any number of columns for equality conditions, and the next column of the index can be used for either a range condition, or sorting the results. But not more than one column is used for either of those operations.

    You can't optimize everything.


    Re your comment: If you have an index on the columns excluding birth_date:

    alter table profiles3m add key bk1 (country, city, gender, orderid);
    

    Then the EXPLAIN shows there is no filesort:

    EXPLAIN SELECT
        pro.uid 
    FROM
        `profiles3m` AS pro 
    WHERE
        pro.country = 'INDONESIA' 
        AND pro.city IN ( 'MAKASSAR' ) 
        AND pro.gender = 0 
        AND ( pro.birth_date BETWEEN ( NOW()- INTERVAL 35 YEAR ) AND ( NOW()- INTERVAL 25 YEAR ) ) 
        AND pro.orderid > 0 
    ORDER BY
        pro.orderid
    LIMIT 30\G
    
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: pro
       partitions: NULL
             type: range
    possible_keys: bk1
              key: bk1
          key_len: 489
              ref: NULL
             rows: 1
         filtered: 100.00
            Extra: Using index condition; Using where
    

    (The rows looks low because I'm testing this with an empty table.)

    The caveat is that this uses the index to match all rows matched by country, city, gender, and orderid. Then MySQL will evaluate the remaining condition on birth_date the hard way: row by row.

    But after that, the optimizer knows that it has already fetched the rows in the index order, so it knows that will naturally be in order by orderid, so it can skip the filesort.

    This might or might not be a net win. It depends on how many rows are matched but have to be thrown out by the condition on birth_date. And how costly it is to evaluate that condition for each row. And how does that compare with the savings you would have gotten by using the index to filter by birth_date.