Search code examples
indexingmysql-8.0

How to make an index for when querying json array (array of object) in Mysql 8.0.35?


Environment

  • Mysql 8.0.35

My Situation

I want to get all rows if cancel.cancels[*].cancel_no is exactly "202401215050123". and also I want to use multi value index to optimize the query.

INSERT INTO pay.payment (payment_no, cancel) VALUES ('2024012150200010', '{"cancels": [{"cancel_no": "202401215050123", "amount": 100}, {"cancel_no": "202401215050125", "amount": 200}]}');

What I have tried so far

I started with fresh database.

SELECT VERSION(); -- 8.0.35--

CREATE DATABASE pay;

USE pay;

create table payment
(
    payment_no varchar(50) not null
        primary key,
    cancel     json        null
);


-- this does not work --
ALTER TABLE payment
  ADD INDEX cancel_no_idx ((
    CAST(cancel->>"$.cancels[*].cancel_no" as CHAR(255) ARRAY) COLLATE utf8mb4_bin
  )) USING BTREE;


-- this works --
ALTER TABLE payment
  ADD INDEX cancel_no_idx ((
    CAST(cancel->>"$.cancels[*].cancel_no" as CHAR(255) ARRAY)
  )) USING BTREE;


-- insert a row --
INSERT INTO pay.payment (payment_no, cancel) VALUES ('2024012150200010', '{"cancels": [{"cancel_no": "202401215050123", "amount": 100}]}');

-- query  --
EXPLAIN SELECT cancel FROM payment WHERE JSON_CONTAINS(cancel, '{"cancel_no": "202401215050123"}', '$.cancels');

but the explain result says it's select_type is SIMPLE.

My Question

  • How do I make an index for this purpose and see if query is using the index?
  • please aware that multiple "cancel" can be in the cancels array.
  • and also there can be 0 element in cancels array
  • it doesn't have to be multi value index. I want want an efficient way (possibly index) to query.

How to use multi value index when querying json array in Mysql 8.0.35?

I have asked similar question before, but I only got the answer for cancel[0].


Solution

  • Quick answer

    Use this query

    SELECT cancel FROM payment WHERE JSON_CONTAINS(cancel->'$.cancels[*].cancel_no', '202401215050124');
    

    And index that you have created

    ALTER TABLE payment
      ADD INDEX cancel_no_idx ((
        CAST(cancel->>"$.cancels[*].cancel_no" as CHAR(255) ARRAY)
      )) USING BTREE;
    

    Then EXPLAIN found possible keys

    EXPLAIN SELECT cancel FROM payment WHERE JSON_CONTAINS(cancel->'$.cancels[*].cancel_no', '202401215050124');
    

    Long answer

    Your index that works is good, the issue in in query

    The issue is in SELECT query

     JSON_CONTAINS(cancel, '{"cancel_no": "202401215050123"}', '$.cancels')
    

    You are querying for a whole object no just values of cancel_no and (I guess) MySQL Engine is not trying to check for what keys index exist and for what not, query JSON is already complex.

    First try to resolve it may look like get all values of the array

    SELECT cancel FROM payment WHERE JSON_CONTAINS(cancel, '202401215050124', '$.cancels[*].cancel_no');
    

    But this doesn't work because error says:

    In this situation, path expressions may not contain the * and ** tokens or an array range.
    

    To solve it we can try use extract instead and remove path argument

    SELECT cancel FROM payment WHERE JSON_CONTAINS(cancel->'$.cancels[*].cancel_no', '202401215050124');
    

    This form only works when INDEX cancel_no_idx is present, because strings in json is surrounded by quotes("), to make it works in any case make your query argument in double quotes " like this '"202401215050124"'. It can't be solved with JSON_UNQUOTE(so ->> will also not work) as it works only on string not array of strings.

    Using EXPLAIN on this query will show you that index has been found for this query.

    Another way if to solve it when we know extracted value by cancel->'$.cancels[*].cancel_no' is an array, is to use MEMBER OF because it expected provided argument to be array

    SELECT cancel FROM payment WHERE '202401215050124' MEMBER OF(JSON_EXTRACT(cancel, "$.cancels[*].cancel_no"));
    

    Testing values that I used

    INSERT INTO payment (payment_no, cancel) VALUES ('2024012150200010', '{"cancels": [{"cancel_no": "202401215050123", "amount": 100}, {"cancel_no": "202401215050124", "amount": 101}]}');
    INSERT INTO payment (payment_no, cancel) VALUES ('2024012150200011', '{"cancels": [{"cancel_no": "202401215050124", "amount": 200}, {"cancel_no": "202401215050125", "amount": 201}]}');
    INSERT INTO payment (payment_no, cancel) VALUES ('2024012150200012', '{"cancels": []}');
    INSERT INTO payment (payment_no, cancel) VALUES ('2024012150200013', '{"cancels": [{"cancel_no": "202401215050126", "amount": 200}]}');