I have one table contains about 3 million rows which structure as follow:
CREATE TABLE `profiles3m` (
`uid` int(10) unsigned NOT NULL,
`birth_date` date NOT NULL,
`gender` tinyint(4) NOT NULL DEFAULT '0',
`country` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'ID',
`city` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'Makassar',
`created_at` timestamp NULL DEFAULT NULL,
`premium` tinyint(4) NOT NULL DEFAULT '0',
`updated_at` timestamp NULL DEFAULT NULL,
`latitude` double NOT NULL DEFAULT '0',
`longitude` double NOT NULL DEFAULT '0',
`orderid` int(11) NOT NULL,
PRIMARY KEY (`uid`),
KEY `idx_composites_latitude_longitude_gender_birth_date_created_at` (`latitude`,`longitude`,`country`,`city`,`gender`,`birth_date`) USING BTREE,
KEY `idx_composites_country_city_gender_birth_date` (`country`,`city`,`gender`,`birth_date`,`orderid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
I am failed to tell MySQL Optimizer to use all columns in the Composite index definition, seems like the optimizer just ignoring the last column as orderid
for ordering purpose which is just a copy of uid
column as you might know PRIMARY KEY in InnoDB table cannot use to ordering because it may instruct the optimizer to use the PRIMARY KEY as index rather than using our composite Indexes and that is the idea of the creation of orderid
column comes from.
The following SQL query, along with the Explain JSON, plus Show Index statement to show all Index Statistics on the table may help to analysing the caused.
SELECT
pro.uid
FROM
`profiles3m` AS pro
WHERE
pro.country = 'INDONESIA'
AND pro.city IN ( 'MAKASSAR' )
AND pro.gender = 0
AND ( pro.birth_date BETWEEN ( NOW()- INTERVAL 35 YEAR ) AND ( NOW()- INTERVAL 25 YEAR ) )
AND pro.orderid > 0
ORDER BY
pro.orderid
LIMIT 30
Explain JSON as follows:
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "45278.73"
},
"ordering_operation": {
"using_filesort": true,
"cost_info": {
"sort_cost": "19051.43"
},
"table": {
"table_name": "pro",
"access_type": "range",
"possible_keys": [
"idx_composites_country_city_gender_birth_date"
],
"key": "idx_composites_country_city_gender_birth_date",
"used_key_parts": [
"country",
"city",
"gender",
"birth_date"
],
"key_length": "488",
"rows_examined_per_scan": 57160,
"rows_produced_per_join": 19051,
"filtered": "33.33",
"using_index": true,
"cost_info": {
"read_cost": "22417.02",
"eval_cost": "3810.29",
"prefix_cost": "26227.30",
"data_read_per_join": "9M"
},
"used_columns": [
"uid",
"birth_date",
"gender",
"country",
"city",
"orderid"
],
"attached_condition": "((`restful`.`pro`.`gender` = 0) and (`restful`.`pro`.`country` = 'INDONESIA') and (`restful`.`pro`.`city` = 'MAKASSAR') and (`restful`.`pro`.`birth_date` between <cache>((now() - interval 35 year)) and <cache>((now() - interval 25 year))) and (`restful`.`pro`.`orderid` > 0))"
}
}
}
}
below is for show index statement :
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 0 | PRIMARY | 1 | uid | A | 2984412 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_latitude_longitude_gender_birth_date_created_at | 1 | latitude | A | 2934360 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_latitude_longitude_gender_birth_date_created_at | 2 | longitude | A | 2984080 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_latitude_longitude_gender_birth_date_created_at | 3 | country | A | 2984080 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_latitude_longitude_gender_birth_date_created_at | 4 | city | A | 2984080 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_latitude_longitude_gender_birth_date_created_at | 5 | gender | A | 2984080 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_latitude_longitude_gender_birth_date_created_at | 6 | birth_date | A | 2984080 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_country_city_gender_birth_date | 1 | country | A | 1 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_country_city_gender_birth_date | 2 | city | A | 14 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_country_city_gender_birth_date | 3 | gender | A | 29 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_country_city_gender_birth_date | 4 | birth_date | A | 362449 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1 | idx_composites_country_city_gender_birth_date | 5 | orderid | A | 2984412 | | | | BTREE |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
What really interesting to look in Explain JSON, they told us if the optimizer can only use four part of our indexed and not surprisingly ordering operation is using filesort as you know means slower execution which is bad for application performance.
idx_composites_country_city_gender_birth_date
(country
,city
,gender
,birth_date
,orderid
)
"ordering_operation": {
"using_filesort": true,
.....
"key": "idx_composites_country_city_gender_birth_date",
"used_key_parts": [
"country",
"city",
"gender",
"birth_date"
],
Do i missed something, is it caused by RANGE
clause in our WHERE
statement?, i've been tested with different combinations of columns in our Composite index sequence for example i am changing orderid
column with premium
which is a flag column type which only contain 0 and 1, and it worked MySQL Optimizer can utilising all five columns, then why the Optimizer can't do the same with orderid
column? is it having to do with Cardinality? i am not so sure, the only thing i can assure is that i must make the ORDER BY
working without any impact to the application performance no matter how to do it.
I've been searching the answer in this couple days, but still cannot resolve it. almost forgot to mention MySQL Version in case it helps.
+------------+
| version() |
+------------+
| 5.7.29-log |
+------------+
You noticed that it is only using four of the columns of the index:
"used_key_parts": [
"country",
"city",
"gender",
"birth_date"
],
Despite the conditions in your WHERE clause referencing all five columns:
WHERE
pro.country = 'INDONESIA'
AND pro.city IN ( 'MAKASSAR' )
AND pro.gender = 0
AND ( pro.birth_date BETWEEN ( NOW()- INTERVAL 35 YEAR ) AND ( NOW()- INTERVAL 25 YEAR ) )
AND pro.orderid > 0
However, there's something different about these conditions. The conditions on country
, city
, gender
are all equality conditions. Once the search finds the subset of the index with those values, then the subset is ordered by birth_date
next, and if there are some rows that are tied for birth_date
, these are further ordered by orderid
.
Just like if you read a telephone book, and you find all people whose last name is "Smith", they are ordered by first name. If there are multiple people who have the same first name as well, they are ordered in the phone book according to their respective phone number.
Smith, Sarah 408-555-1234
Smith, Sarah 408-555-5678
But what if you search for all people with last name Smith and a variety of first names beginning with "S"?
Smith, Sam 408-555-3298
Smith, Sarah 408-555-1234
Smith, Sarah 408-555-5678
Smith, Stan 408-555-4224
These are not in sorted order by phone number. They sort by last name, then by first name, then by phone number only if they are tied in the preceding columns.
If you want to get them sorted by phone number, you could create an index with columns in another order, like last name, phone number, first name.
Smith 408-555-1234 Sarah
Smith 408-555-2020 David
Smith 408-555-3298 Sam
Smith 408-555-4100 Charlie
Smith 408-555-4224 Stan
Smith 408-555-5555 Annette
Smith 408-555-5678 Sarah
Now they are in phone number order, but there are other names among them that don't match your condition for first names beginning with "S". They aren't even in sorted order by first name, because the third column for first name would be sorted only when the first two columns are tied.
This points out a general problem with indexes: You can reorder the columns only for columns involved in equality comparisons. If you want to sort results, you can use the index only if you sort by a column in the index and all preceding columns of the index are used for equality comparisons only.
Once you reference one column in a range comparison, any subsequent columns in the index are ignored for both searching and sorting.
Stated another way: the index can have any number of columns for equality conditions, and the next column of the index can be used for either a range condition, or sorting the results. But not more than one column is used for either of those operations.
You can't optimize everything.
Re your comment: If you have an index on the columns excluding birth_date
:
alter table profiles3m add key bk1 (country, city, gender, orderid);
Then the EXPLAIN shows there is no filesort:
EXPLAIN SELECT
pro.uid
FROM
`profiles3m` AS pro
WHERE
pro.country = 'INDONESIA'
AND pro.city IN ( 'MAKASSAR' )
AND pro.gender = 0
AND ( pro.birth_date BETWEEN ( NOW()- INTERVAL 35 YEAR ) AND ( NOW()- INTERVAL 25 YEAR ) )
AND pro.orderid > 0
ORDER BY
pro.orderid
LIMIT 30\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: pro
partitions: NULL
type: range
possible_keys: bk1
key: bk1
key_len: 489
ref: NULL
rows: 1
filtered: 100.00
Extra: Using index condition; Using where
(The rows
looks low because I'm testing this with an empty table.)
The caveat is that this uses the index to match all rows matched by country
, city
, gender
, and orderid
. Then MySQL will evaluate the remaining condition on birth_date
the hard way: row by row.
But after that, the optimizer knows that it has already fetched the rows in the index order, so it knows that will naturally be in order by orderid
, so it can skip the filesort.
This might or might not be a net win. It depends on how many rows are matched but have to be thrown out by the condition on birth_date
. And how costly it is to evaluate that condition for each row. And how does that compare with the savings you would have gotten by using the index to filter by birth_date
.