EDIT: I am seeing the same behavior in Python as PHP. Seems to be something with MySQL.
We are trying to upgrade from MySQL 5.7 to 8.0. Our codebase uses PHP MySQLi for queries to our MySQL server. In our test setups, we are seeing poorer performance (50x slower) on certain queries that bind lots of parameters. We want to see MySQL 8.0 run in similar time as 5.7. Below is the example table structure and trouble query.
CREATE TABLE IF NOT EXISTS `a` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `name` (`name`) USING BTREE,
KEY `name_id` (`id`,`name`) USING BTREE
);
CREATE TABLE IF NOT EXISTS `b` (
`id` int NOT NULL AUTO_INCREMENT,
`a_id` int NOT NULL,
`value` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `uniquevalue` (`a_id`,`value`) USING BTREE,
KEY `a_id` (`a_id`) USING BTREE,
KEY `v` (`value`) USING BTREE,
CONSTRAINT `b_ibfk_1` FOREIGN KEY (`a_id`) REFERENCES `a` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);
CREATE TABLE IF NOT EXISTS `c` (
`product` varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`b_id` int NOT NULL,
PRIMARY KEY (`product`,`b_id`) USING BTREE,
KEY `b_id` (`b_id`),
KEY `product` (`product`),
CONSTRAINT `c_ibfk_2` FOREIGN KEY (`b_id`) REFERENCES `b` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);
-- example trouble query
SELECT c.product, a.name, b.value
FROM b
INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
-- this hash is from the dataset (linked below) but it should match a record in the 'a' table that has an associated record in the 'b' table that in turn has an associated record in the 'c' table
INNER JOIN c on c.b_id = b.id and c.product IN (?, ?, ?...) -- ... meaning dynamic number of parameters
If the query is modified to only return one record (limit 1), the query is still slow. So it isn't about the volume of data being returned. If the query is ran non-parameterized (with string concatenation), query run time is acceptable in all environments. The more parameters you add, the slower the query gets (linear). With 7,000 bound parameters, the query runs in 100 - 150 ms in MySQL 5.7 and ~10 seconds in MySQL 8.0.28. We see the same results in PHP 7.4 and 8.0. We see the same results with MySQLi or PDO.
This tells me that it is something to do with parameter binding. I enabled profiling and checked the results for the query. The bulk of the query's time (~95%) was spent in the execution step, not the parameter binding step. Also, I see mysql 8 process CPU is pegged while query is running. I'm pretty stumped on this one.
Here is the explain for MySQL 8.0.
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | const | PRIMARY,name,name_id | name | 1022 | const | 1 | 100 | Using index | |
1 | SIMPLE | c | ref | PRIMARY,b_id,product | product | 152 | const | 1 | 100 | Using index | |
1 | SIMPLE | b | eq_ref | PRIMARY,uniquevalue,a_id | PRIMARY | 4 | DefaultWeb.c.b_id | 1 | 5 | Using where |
Here is the explain for MySQL 5.7.
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | const | PRIMARY,name,name_id | name | 257 | const | 1 | 100 | Using index | |
1 | SIMPLE | c | ref | PRIMARY,b_id,product | PRIMARY | 152 | const | 1 | 100 | Using index | |
1 | SIMPLE | b | eq_ref | PRIMARY,uniquevalue,a_id | PRIMARY | 4 | DefaultWeb.c.b_id | 1 | 5 | Using where |
There are some differences between these two explains, but once again this problem only occurs with prepared statements within PHP.
Below is some php code demonstrating the problem. This code is written to work against the dataset I've provided in the Google Drive link below. I've also included our MySQL variables in a CSV.
<?php
// Modify these to fit your DB connection.
const HOST = '127.0.0.1';
const USER = 'root';
const PASS = 'localtest';
const DB_NAME = 'TestDatabase';
// As the number of parameters increases, time increases linearly.
// We're seeing ~10 seconds with 7000 params with this data.
const NUM_PARAMS = 7000;
function rand_string($length = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
return $randomString;
}
function sql_question_marks($count, $sets = 1) {
return substr(str_repeat(",(".substr(str_repeat(",?", $count), 1).")", $sets), 1);
}
function unsecure_concat($params) {
return "('" . implode("','", $params) . "')";
}
$params = [];
$param_types = '';
for ($i = 0; $i < NUM_PARAMS; $i++) {
$params[] = rand_string();
$param_types .= 's';
}
$big_query = <<<SQL
SELECT c.product, a.name, b.value
FROM b
INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
INNER JOIN c on c.b_id = b.id and c.product IN
SQL . sql_question_marks(count($params));
$non_parameterized = <<<SQL
SELECT c.product, a.name, b.value
FROM b
INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
INNER JOIN c on c.b_id = b.id and c.product IN
SQL . unsecure_concat($params);
$connection = new mysqli(HOST, USER, PASS, DB_NAME);
$q = $connection->prepare($big_query);
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);
$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds
echo 'The total time for parameterized query is ' . $total_time . ' seconds.';
$q->get_result(); // not concerned with results.
$q = $connection->prepare($big_query . ' LIMIT 1');
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one also shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);
$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds
echo '<br>The total time for parameterized query with limit 1 is ' . $total_time . ' seconds.';
$q->get_result(); // not concerned with results
$q = $connection->prepare($non_parameterized);
$start_time = hrtime(true);
$q->execute(); // Same execution time in 5.7 and 8.0.
$end_time = hrtime(true);
$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds
echo '<br>The total time for non-parameterized query is ' . $total_time . ' seconds.';
You can download example data here: https://drive.google.com/file/d/111T7g1NowfWO_uZ2AhT9jdj4LiSNck8u/view?usp=sharing
EDIT: Here is the JSON explain with 7,000 bound parameters.
{
"EXPLAIN": {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "456.60"
},
"nested_loop": [
{
"table": {
"table_name": "a",
"access_type": "const",
"possible_keys": [
"PRIMARY",
"name",
"name_id"
],
"key": "name",
"used_key_parts": [
"name"
],
"key_length": "257",
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "0.00",
"eval_cost": "0.10",
"prefix_cost": "0.00",
"data_read_per_join": "264"
},
"used_columns": [
"id",
"name"
]
}
},
{
"table": {
"table_name": "b",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"uniquevalue",
"a_id"
],
"key": "uniquevalue",
"used_key_parts": [
"a_id"
],
"key_length": "4",
"ref": [
"const"
],
"rows_examined_per_scan": 87,
"rows_produced_per_join": 87,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "8.44",
"eval_cost": "8.70",
"prefix_cost": "17.14",
"data_read_per_join": "65K"
},
"used_columns": [
"id",
"a_id",
"value"
]
}
},
{
"table": {
"table_name": "c",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"b_id",
"product"
],
"key": "b_id",
"used_key_parts": [
"b_id"
],
"key_length": "4",
"ref": [
"TestDatabase.b.id"
],
"rows_examined_per_scan": 35,
"rows_produced_per_join": 564,
"filtered": "18.28",
"using_index": true,
"cost_info": {
"read_cost": "130.53",
"eval_cost": "56.47",
"prefix_cost": "456.60",
"data_read_per_join": "88K"
},
"used_columns": [
"product",
"b_id"
],
"attached_condition": "" // i've omitted the condition since it breaks the SO char limit, it contains 7,000 random character strings at 10 length each
}
}
]
}
}
}
This bug seems to be resolved in MySQL 8.0.31. Thanks Oracle