I'm playing with MySQL 8's new JSON capabilities, in particular multi-value indexes.
I noticed that there are 2 ways to check if a JSON array contains a given value: the MEMBER OF() operator, and the JSON_CONTAINS() function.
They both return the same set of results for every query I make, but surprisingly, MEMBER OF
seems to be 3x faster than JSON_CONTAINS
.
Example on a table with 200,000 records, with around 700,000 values total in the catIds
fields:
CREATE TABLE test (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
catIds JSON NOT NULL,
PRIMARY KEY (id),
KEY test_categories ((CAST(catIds AS UNSIGNED ARRAY)))
);
INSERT INTO test (catIds) VALUES('[123, 456]');
...
... some 200,000 inserted records later:
mysql> SELECT count(*) FROM test WHERE 51 member of (catIds);
+----------+
| count(*) |
+----------+
| 7287 |
+----------+
1 row in set (0.11 sec)
mysql> SELECT count(*) FROM test WHERE JSON_CONTAINS(catIds, '51');
+----------+
| count(*) |
+----------+
| 7287 |
+----------+
1 row in set (0.36 sec)
If first thought that this was because the stringified JSON value '51'
was maybe converted on each iteration, so I tried assigning it to a variable first; but this does not make it any faster:
mysql> SET @value = CAST(51 as JSON);
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT count(*) FROM test WHERE JSON_CONTAINS(catIds, @value);
+----------+
| count(*) |
+----------+
| 7287 |
+----------+
1 row in set (0.38 sec)
Unless I'm mistaken, MEMBER OF()
and JSON_CONTAINS()
are equivalent feature-wise. In this case, why is one of them faster than the other?
JSON_CONTAINS()
does more complex work than MEMBER OF
.
JSON_CONTAINS()
must parse its second argument, a candidate JSON document that you're searching for within the stored JSON document.
The candidate may not be a simple scalar as you're searching for in your example above. It could be a more complex document with its own nested arrays and objects.
Therefore comparing the candidate to the stored document must potentially compare in a more complex manner, not just searching for a single scalar value, but comparing recursively, all nested elements.
Even if your example search is for a simple scalar value, it still invokes the same code path that might need to search for a complex document. Based on your timing measurement, that code path seems to have more overhead.
Whereas MEMBER OF
only searches for a scalar value, and only searches in an array. It can also optimize by using a cached, pre-sorted array.
See https://github.com/mysql/mysql-server/blob/8.0/sql/item_json_func.cc#L3852 for the code.