Search code examples
jsonsqlitesqlite-json1

how to extract properly when sqlite json has value as an array


I have a sqlite database and in one of the fields I have stored complete json object . I have to make some json select requests . If you see my json the ALL key has value which is an array . We need to extract some data like all comments where "pod" field is fb . How to extract properly when sqlite json has value as an array ?

select json_extract(data,'$."json"') from datatable ; gives me entire thing . Then I do select json_extract(data,'$."json"[0]') but i dont want to do it manually . i want to iterate .

kindly suggest some source where i can study and work on it . MY JSON

{
    "ALL": [{
            "comments": "your site is awesome",
            "pod": "passcode",
            "originalDirectory": "case1"
        },
        {
            "comments": "your channel is good",
            "data": ["youTube"],
            "pod": "library"
        },
        {
            "comments": "you like everything",
            "data": ["facebook"],
            "pod": "fb"
        },
        {
            "data": ["twitter"],
            "pod": "tw",
            "ALL": [{
                "data": [{
                    "codeLevel": "3"
                }],
                "pod": "mo",
                "pod2": "p"
            }]
        }
    ]
}



create table datatable ( path string , data json1 );
insert into datatable values("1" , json('<abovejson in a single line>'));

Solution

  • Simple List

    Where your JSON represents a "simple" list of comments, you want something like:

    select key, value 
      from datatable, json_each( datatable.data, '$.ALL' )
     where json_extract( value, '$.pod' ) = 'fb' ;
    

    which, using your sample data, returns:

    2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
    

    The use of json_each() returns a row for every element of the input JSON (datatable.data), starting at the path $.ALL (where $ is the top-level, and ALL is the name of your array: the path can be omitted if the top-level of the JSON object is required). In your case, this returns one row for each comment entry.

    The fields of this row are documented at 4.13. The json_each() and json_tree() table-valued functions in the SQLite documentation: the two we're interested in are key (very roughly, the "row number") and value (the JSON for the current element). The latter will contain elements called comment and pod, etc..

    Because we are only interested in elements where pod is equal to fb, we add a where clause, using json_extract() to get at pod (where $.pod is relative to value returned by the json_each function).

    Nested List

    If your JSON contains nested elements (something I didn't notice at first), then you need to use the json_tree() function instead of json_each(). Whereas the latter will only iterate over the immediate children of the node specified, json_tree() will descend recursively through all children from the node specified.

    To give us some data to work with, I have augmented your test data with an extra element:

    create table datatable ( path string , data json1 );
    insert into datatable values("1" , json('
        {
            "ALL": [{
                    "comments": "your site is awesome",
                    "pod": "passcode",
                    "originalDirectory": "case1"
                },
                {
                    "comments": "your channel is good",
                    "data": ["youTube"],
                    "pod": "library"
                },
                {
                    "comments": "you like everything",
                    "data": ["facebook"],
                    "pod": "fb"
                },
                {
                    "data": ["twitter"],
                    "pod": "tw",
                    "ALL": [{
                        "data": [{
                            "codeLevel": "3"
                        }],
                        "pod": "mo",
                        "pod2": "p"
                    },
                    {
                        "comments": "inserted by TripeHound",
                        "data": ["facebook"],
                        "pod": "fb"
                    }]
                }
            ]
        }
    '));
    

    If we were to simply switch to using json_each(), then we see that a simple query (with no where clause) will return all elements of the source JSON:

    select key, value 
      from datatable, json_tree( datatable.data, '$.ALL' ) limit 10 ;
    
    ALL|[{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"},{"comments":"your channel is good","data":["youTube"],"pod":"library"},{"comments":"you like everything","data":["facebook"],"pod":"fb"},{"data":["twitter"],"pod":"tw","ALL":[{"data":[{"codeLevel":"3"}],"pod":"mo","pod2":"p"},{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}]}]
    0|{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"}
    comments|your site is awesome
    pod|passcode
    originalDirectory|case1
    1|{"comments":"your channel is good","data":["youTube"],"pod":"library"}
    comments|your channel is good
    data|["youTube"]
    0|youTube
    pod|library
    

    Because JSON objects are mixed in with simple values, we can no longer simply add where json_extract( value, '$.pod' ) = 'fb' because this produces errors when value does not represent an object. The simplest way around this is to look at the type values returned by json_each()/json_tree(): these will be the string object if the row represents a JSON object (see above documentation for other values).

    Adding this to the where clause (and relying on "short-circuit evaluation" to prevent json_extract() being called on non-object rows), we get:

    select key, value
      from datatable, json_tree( datatable.data, '$.ALL' )
     where type = 'object'
       and json_extract( value, '$.pod' ) = 'fb' ;
    

    which returns:

    2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
    1|{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}
    

    If desired, we could use json_extract() to break apart the returned objects:

    .mode column
    .headers on
    .width 30 15 5
    select json_extract( value, '$.comments' ) as Comments,
           json_extract( value, '$.data' ) as Data,
           json_extract( value, '$.pod' ) as POD
      from datatable, json_tree( datatable.data, '$.ALL' )
     where type = 'object'
       and json_extract( value, '$.pod' ) = 'fb' ;
    
    Comments                        Data             POD
    ------------------------------  ---------------  -----
    you like everything             ["facebook"]     fb
    inserted by TripeHound          ["facebook"]     fb
    
    

    Note: If your structure contained other objects, of different formats, it may not be sufficient to simply select for type = 'object': you may have to devise a more subtle filtering process.