Search code examples
sqlmysqlmysql-5.7mysql-json

Retrieve all answers in an array from a MySQL JSON column


I am using MySQL 5.7 and currently have a JSON column with data like this:

{"quiz_scores":[{"quiz_date":1697706210448,"symptom_improvement_score":false,"coping_skill_acquisition_score":true,"increased_self_awareness_score":true,"account_age_in_days":2197},{"quiz_date":1697706316322,"symptom_improvement_score":false,"coping_skill_acquisition_score":false,"increased_self_awareness_score":false,"account_age_in_days":2197},{"quiz_date":1697706347304,"symptom_improvement_score":true,"coping_skill_acquisition_score":true,"increased_self_awareness_score":true,"account_age_in_days":2197},{"quiz_date":1698229246050,"symptom_improvement_score":true,"coping_skill_acquisition_score":false,"increased_self_awareness_score":true,"account_age_in_days":2203},{"quiz_date":1698320799172,"symptom_improvement_score":true,"coping_skill_acquisition_score":true,"increased_self_awareness_score":true,"account_age_in_days":2204}]}

I can extract the latest result with some conditions using json_length and subtracting one.

select 
    sp.user_uuid, 
    sp.stub,
    FROM_UNIXTIME(JSON_EXTRACT(sp.`value`,CONCAT("$.quiz_scores[",JSON_LENGTH(sp.`value` ->> '$.quiz_scores')-1,"].quiz_date")) /1000) as quiz_date,
    JSON_EXTRACT(sp.`value`,CONCAT("$.quiz_scores[",JSON_LENGTH(sp.`value` ->> '$.quiz_scores')-1,"].account_age_in_days")) as account_age_in_days
from series_progress sp
where sp.stub in ('questionnaire_my_progress')
and JSON_EXTRACT(sp.`value`,CONCAT("$.quiz_scores[",JSON_LENGTH(sp.`value` ->> '$.quiz_scores')-1,"].symptom_improvement_score")) = true
and JSON_EXTRACT(sp.`value`,CONCAT("$.quiz_scores[",JSON_LENGTH(sp.`value` ->> '$.quiz_scores')-1,"].coping_skill_acquisition_score")) = true
and JSON_EXTRACT(sp.`value`,CONCAT("$.quiz_scores[",JSON_LENGTH(sp.`value` ->> '$.quiz_scores')-1,"].increased_self_awareness_score")) = true
and JSON_EXTRACT(sp.`value`,CONCAT("$.quiz_scores[",JSON_LENGTH(sp.`value` ->> '$.quiz_scores')-1,"].account_age_in_days")) > 1

What I would like to do, instead of only checking the latest score, but get all scores from the JSON. It sounds like JSON_TABLE in MySQL 8 may do the job, is there a way in MySQL 5.7?


Solution

  • To do this, make a little table with ordinal numbers:

    CREATE TABLE numbers ( n INTEGER UNSIGNED PRIMARY KEY );
    INSERT INTO numbers (n) VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
    

    This table doesn't have to have all integers up to 232, but it needs at least as many as the maximum length of JSON array you need to support.

    Then you can cross-join this to your table with JSON and get every element from a JSON array.

    SELECT
      t.user_uuid,
      t.stub,
      FROM_UNIXTIME(t.quiz_score->>'$.quiz_date'/1000) AS quiz_date,
      t.quiz_score->>'$.account_age_in_days' AS account_age_in_days
    FROM (
      SELECT
        sp.user_uuid, 
        sp.stub,
        JSON_EXTRACT(sp.`value`, CONCAT('$.quiz_scores[',n.n,']')) AS quiz_score
      FROM series_progress AS sp
      CROSS JOIN numbers AS n
      WHERE n.n < JSON_LENGTH(sp.`value`->'$.quiz_scores')
      AND sp.stub IN ('questionnaire_my_progress')
    ) AS t
    WHERE t.quiz_score->>'$.symptom_improvement_score' = true
      AND t.quiz_score->>'$.coping_skill_acquisition_score' = true
      AND t.quiz_score->>'$.increased_self_awareness_score' = true
      AND t.quiz_score->>'$.account_age_in_days' > 1;
    

    Honestly, workarounds like this make it apparent that JSON is not actually an efficient idea when you need to use SQL predicates to search or sort fields within a JSON document.


    Re your comment:

    The problem you are facing is that MySQL doesn't convert JSON booleans to SQL booleans transparently. A JSON boolean value like true or false is returned as binary strings 'true' or 'false', whereas MySQL booleans are unsigned integers 1 and 0 respectively.

    We can see this in the MySQL client when we run with the --column-type-info option:

    mysql> set @j = '{"f1": true, "f2": false}';
    
    mysql> select json_unquote(json_extract(@j, "$.f1")) as f1;
    Field   1:  `f1`
    Catalog:    `def`
    Database:   ``
    Table:      ``
    Org_table:  ``
    Type:       LONG_BLOB
    Collation:  utf8mb4_0900_ai_ci (255)
    Length:     4294967295
    Max_length: 4
    Decimals:   31
    Flags:      BINARY 
    
    +------+
    | f1   |
    +------+
    | true |
    +------+
    

    When you compare this string to the SQL true (which is really the integer 1), it compels MySQL to try to case the binary string 'true' to a comparable numeric value. MySQL casts strings to numeric values by examining any leading digits, and if there are no leading digits, it assumes the numeric value of the string is 0. So both 'true' and 'false' are cast to 0 (false).

    This is related to this bug: https://bugs.mysql.com/bug.php?id=99237 The workaround for that bug is to use a new MySQL 8.0 function JSON_VALUE() to extract the boolean and cast it to an unsigned value explicitly. You'd do something like:

    WHERE ...
      JSON_VALUE(t.quiz_score, '$.coping_skill_acquisition_score' RETURNING UNSIGNED) = true
    

    But of course JSON_VALUE() isn't implemented in MySQL 5.7.

    You can work around this in your case by comparing the extracted value to the string 'true' instead of the boolean literal true.

    WHERE t.quiz_score->>'$.symptom_improvement_score' = 'true'
      AND t.quiz_score->>'$.coping_skill_acquisition_score' = 'true'
      AND t.quiz_score->>'$.increased_self_awareness_score' = 'true'
    

    This is yet another example of why using JSON for relational data is a terrible idea. You end up spending a lot of time studying all the intricate semantics of type casting to work around these sorts of bugs.


    Testing this further, I discovered that JSON booleans are cast properly if you don't use the JSON unquoting. By using ->> you made the boolean cast to the binary string. But if you had used -> then the JSON boolean can implicitly cast to an SQL boolean.

    The following works, using non-unquoted JSON extraction and comparing to SQL boolean literals:

    WHERE t.quiz_score->'$.symptom_improvement_score' = true
      AND t.quiz_score->'$.coping_skill_acquisition_score' = true
      AND t.quiz_score->'$.increased_self_awareness_score' = true
    

    More and more layers of confusing semantics!