Search code examples
mysqlsqlmysql-json

in mysql, how to delete elements in json array that don't contain certain string values?


I have table with a column of jsons that have example value:

{"name": "1", "list": ["abc", "xyz"]}
{"name": "2", "list": ["b", "bc", "bcd"]}
{"name": "3", "list": ["cd", "cdef", "def", "defg", "ef"]}

I also have a array of (sub)strings that I want to keep in the array, for example:
["a","bc","def"]
(in my reality, about 20 strings)

How do I update the table so that only elements of the list that contain at least one of the strings remain?

expected result:

{"name": "1", "list": ["abc"]}
{"name": "2", "list": ["bc", "bcd"]}
{"name": "3", "list": ["cdef", "def", "defg"]}

honestly, i have no clue where to even begin, or even if it's possible in sql with its json functions.


Solution

  • Here is an example. The idea is to split the values from the list arrays and check for matches using REGEXP. Then build new list arrays containing only the matched elements. After that, update the original table.

    CREATE TABLE DataSource 
    ( 
        id INT AUTO_INCREMENT PRIMARY KEY
       ,RawData JSON
    );
    
    INSERT INTO DataSource (RawData) VALUES 
        ('{"name": "1", "list": ["abc", "xyz"]}'),
        ('{"name": "2", "list": ["b", "bc", "bcd"]}'),
        ('{"name": "3", "list": ["cd", "cdef", "def", "defg", "ef"]}');
    
    SET @keepList = '["a","bc","def"]';
    SET @keepList = REPLACE(REPLACE(REPLACE(REPLACE(@keepList, '[', ''), ']', ''), ',', '|'), '"', '');
    
    WITH NewDataSource (ID, name, value) AS
    (
          SELECT DS.ID
                ,JSON_EXTRACT(DS.RawData, '$.name')
                ,JSON_ARRAYAGG(j.value)
          FROM DataSource DS,
          JSON_TABLE(DS.RawData, '$.list[*]' COLUMNS (value VARCHAR(255) PATH '$')) AS j
          WHERE j.value REGEXP  @keepList
          GROUP BY DS.ID
                  ,JSON_EXTRACT(DS.RawData, '$.name')
     ) 
    UPDATE DataSource T
    INNER JOIN NewDataSource S
        ON T.id = S.id
    SET T.RawData = JSON_OBJECT
                   (
                      'name', S.name
                     ,'list', S.value
                   );
                   
    SELECT *
    FROM DataSource;
    

    enter image description here