Search code examples
mysqlsqlmysql-json

A SQL query to filter through a stringified json array


I am struggling to write a sql query that can retrieve results from a table in a certain way.

I have a table that may have the following type of data. In this table, values in antecendents column are sorted lists of skus in string form

Id | antecendents | ...         | ....
1  | ["a","b","c"]| ...         | .....
2  | ["a"]        | ...         | .....
3  | ["a","b"]    | ...         | .....
4  | ["a","c"]    | ...         | .....
5  | ["a","c","x"]| ...         | .....
6  | ["a","y","c"]| ...         | .....
7  | ["c"]        | ...         | .....

Now suppose I have a set of skus (e.g. "a" and "c")

I want to retrieve only those rows from the table that have all combinations of "a" and "c" but nothing else. So my query would return the following

Id | antecendents | ...         | ....
2  | ["a"]        | ...         | .....
4  | ["a","c"]    | ...         | .....
7  | ["c"]        | ...         | .....

I could write a query to get a partial result and further filter it in code but it would be more efficient to have this all be done in a sql query.

Any help would be greatly appreciated.


Solution

  • I came up with this solution. Strip the 'a' element out, then with the resulting array, strip the 'c' element out, then compare that to an empty json_array(). If the antecedents minus the 'a' element minus the 'c' element is an empty array, then there are no other elements, and that is one of the rows you want.

    select id, antecedents from (
      select id, antecedents, coalesce(json_remove(antecedents, json_unquote(json_search(antecedents, 'one', 'a'))), antecedents) as a from mytable ) as t
    where coalesce(json_remove(a, json_unquote(json_search(a, 'one', 'c'))), a) = json_array()
    

    Result:

    +----+-------------+
    | id | antecedents |
    +----+-------------+
    |  2 | ["a"]       |
    |  4 | ["a", "c"]  |
    |  7 | ["c"]       |
    +----+-------------+
    

    But honestly, it's not practical to write code this complex. It's difficult to write, difficult to read, difficult to debug, and difficult to maintain or modify.

    Always keep this wisdom in mind:

    "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian Kernighan

    It would be much easier if you did not use a JSON array. Just store one element per row.

    create table mytable2 (id int, antecedent varchar(10), primary key (id, antecedent));
    
    insert into mytable2 values
    (1, 'a'), (1,'b'), (1,'c'),
    (2, 'a'),
    (3, 'a'), (3, 'b'),
    (4, 'a'), (4, 'c'),
    (5, 'a'), (5, 'c'), (5, 'x'),
    (6, 'a'), (6, 'y'), (6, 'c'),
    (7, 'c');
    

    Now it's much easier to conceive of the query. You can even return the sets of values as a JSON array.

    select m1.id, json_arrayagg(m1.antecedent) as antecedents
    from mytable2 m1 left outer join mytable2 as m2
      on m1.id = m2.id and m2.antecedent not in ('a','c')
    where m2.id is null
    group by m1.id
    

    Result:

    +----+-------------+
    | id | antecedents |
    +----+-------------+
    |  2 | ["a"]       |
    |  4 | ["a", "c"]  |
    |  7 | ["c"]       |
    +----+-------------+
    

    If you want to treat the elements as discrete elements in a set, a relational database already has ways of supporting that. Don't use JSON.