Search code examples
mysqljsonmysql-8.0

Why is MEMBER OF() faster than JSON_CONTAINS()?


I'm playing with MySQL 8's new JSON capabilities, in particular multi-value indexes.

I noticed that there are 2 ways to check if a JSON array contains a given value: the MEMBER OF() operator, and the JSON_CONTAINS() function.

They both return the same set of results for every query I make, but surprisingly, MEMBER OF seems to be 3x faster than JSON_CONTAINS.

Example on a table with 200,000 records, with around 700,000 values total in the catIds fields:

CREATE TABLE test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  catIds JSON NOT NULL,
  PRIMARY KEY (id),
  KEY test_categories ((CAST(catIds AS UNSIGNED ARRAY)))
);

INSERT INTO test (catIds) VALUES('[123, 456]');
...

... some 200,000 inserted records later:

mysql> SELECT count(*) FROM test WHERE 51 member of (catIds);
+----------+
| count(*) |
+----------+
|     7287 |
+----------+
1 row in set (0.11 sec)

mysql> SELECT count(*) FROM test WHERE JSON_CONTAINS(catIds, '51');
+----------+
| count(*) |
+----------+
|     7287 |
+----------+
1 row in set (0.36 sec)

If first thought that this was because the stringified JSON value '51' was maybe converted on each iteration, so I tried assigning it to a variable first; but this does not make it any faster:

mysql> SET @value = CAST(51 as JSON);
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT count(*) FROM test WHERE JSON_CONTAINS(catIds, @value);
+----------+
| count(*) |
+----------+
|     7287 |
+----------+
1 row in set (0.38 sec)

Unless I'm mistaken, MEMBER OF() and JSON_CONTAINS() are equivalent feature-wise. In this case, why is one of them faster than the other?


Solution

  • JSON_CONTAINS() does more complex work than MEMBER OF.

    JSON_CONTAINS() must parse its second argument, a candidate JSON document that you're searching for within the stored JSON document.

    The candidate may not be a simple scalar as you're searching for in your example above. It could be a more complex document with its own nested arrays and objects.

    Therefore comparing the candidate to the stored document must potentially compare in a more complex manner, not just searching for a single scalar value, but comparing recursively, all nested elements.

    Even if your example search is for a simple scalar value, it still invokes the same code path that might need to search for a complex document. Based on your timing measurement, that code path seems to have more overhead.

    Whereas MEMBER OF only searches for a scalar value, and only searches in an array. It can also optimize by using a cached, pre-sorted array.

    See https://github.com/mysql/mysql-server/blob/8.0/sql/item_json_func.cc#L3852 for the code.