Search code examples
arraysjsont-sqldynamic

T-SQL parsing dynamic JSON Array


I'm using MSSQL 2017 and have the JSON outputs of a name parsing service stored in a table:

CREATE TABLE test_data (
   ID int,
   fullName nvarchar(50),
   jsonString nvarchar(max)
)
INSERT INTO test_data
   (ID, fullName, jsonString)
VALUES
   (1, N'Dr. Damian Summer', N'{"parsedPerson": {"personType": "NATURAL","personRole": "PRIMARY","mailingPersonRoles": ["ADDRESSEE"],"gender": {"gender": "MALE","confidence": 0.8199955555555556},"addressingGivenName": "Damian","addressingSurname": "Summer","outputPersonName": {"terms": [{"string": "Dr.","termType": "TITLE"},{"string": "Damian","termType": "GIVENNAME"},{"string": "Summer","termType": "SURNAME"}]}},"parserDisputes": [],"likeliness": 0.9480247866561989,"confidence": 0.9451463934269034}'),
   (2, N'Rick Da Costa + M. Tosh', N'{"parsedPerson":{"personType":"MULTIPLE","personRole":"PRIMARY","mailingPersonRoles":["ADDRESSEE"],"people":[{"personType":"NATURAL","personRole":"PRIMARY","mailingPersonRoles":["ADDRESSEE"],"gender":{"gender":"MALE","confidence":1.0},"addressingGivenName":"Rick","addressingSurname":"Da Costa","outputPersonName":{"terms":[{"string":"Rick","termType":"GIVENNAME"},{"string":"Da Costa","termType":"SURNAME"}]}},{"personType":"NATURAL","personRole":"PRIMARY","mailingPersonRoles":["ADDRESSEE"],"gender":{"gender": "UNKNOWN","confidence": 0.6400000000000001},"addressingSurname":"Tosh","outputPersonName":{"terms":[{"string": "M.","termType":"GIVENNAMEINITIAL"},{"string":"Tosh","termType":"SURNAME"}]}}]},"parserDisputes":[],"likeliness": 0.6503418684197658,"confidence": 0.8653094883324904}'),
   (3, N'Cygan Marianne + Edgar', N'{"parsedPerson":{"personType":"MULTIPLE","personRole":"PRIMARY","mailingPersonRoles":["ADDRESSEE"],"people":[{"personType":"NATURAL","personRole":"MEMBER","mailingPersonRoles":["MEMBER"],"gender":{"gender":"FEMALE","confidence":0.94667621434538729},"addressingGivenName":"Marianne","addressingSurname":"Cygan","outputPersonName":{"terms":[{"string":"Cygan","termType":"SURNAME"},{"string":"Marianne","termType":"GIVENNAME"}]}},{"personType":"NATURAL","personRole":"MEMBER","mailingPersonRoles":["MEMBER"],"gender":{"gender":"MALE","confidence":0.95406313536650611},"addressingGivenName":"Edgar","addressingSurname":"Cygan","outputPersonName":{"terms":[{"string":"Cygan","termType":"SURNAME"},{"string":"Edgar","termType":"GIVENNAME"}]}}]},"parserDisputes":[],"likeliness":0.8933697196920577,"confidence":0.96231315577078291}')
 

A record can contain one person (personType = NATURAL) or multiple persons (personType = MULTIPLE) and for each person found, the name is split into components (title, givenname, surname, ...) which are saved in the "terms" array. I want to parse this array for both person types and for every person found.

The result should look like this:

-----------------------------------------------------------------------------------------
ID  fullName            person          termType        string      
-----------------------------------------------------------------------------------------
1   Dr. Damian Summer                   TITLE           Dr.
1   Dr. Damian Summer                   GIVENNAME       Damian
1   Dr. Damian Summer                   SURNAME         Summer
2   Rick Da Costa + M. Tosh     1       GIVENNAME       Rick
2   Rick Da Costa + M. Tosh     1       SURNAME         Da Costa
2   Rick Da Costa + M. Tosh     2       GIVENNAMEINITIAL    M.
2   Rick Da Costa + M. Tosh     2       SURNAME         Tosh
3   Cygan Marianne + Edgar      1       GIVENNAME       Marianne
3   Cygan Marianne + Edgar      1       SURNAME         Cygan
3   Cygan Marianne + Edgar      2       GIVENNAME       Edgar
3   Cygan Marianne + Edgar      2       SURNAME         Cygan

How can I get all the values of "termType" and "string" as well as the number of the person? I have made a few attempts with OPENJSON and cross apply, but without success.

Is there a way to parse both personTypes at the same time? Or do I have to do one pass for NATURAL and one for MULTIPLE?


Solution

  • Ok, I found a solution, though it may not be the most elegant one. However, this requires three passes, one for personType NATURAL and one for each person in personType MULTIPLE:

        --NATURAL 
        SELECT d.ID,
           d.fullName,
           j2.termType,
           j2.string
        FROM test_data d
        CROSS APPLY OPENJSON(d.jsonString, '$.parsedPerson.outputPersonName.terms') j1
        CROSS APPLY
        OPENJSON(j1.Value, '$')
        WITH
        (
            string NVARCHAR(50) '$.string',
            termType NVARCHAR(20) '$.termType'
        ) j2;
    
        --MULTIPLE Person 1
        SELECT d.ID,
           d.fullName,
           j2.termType,
           j2.string
        FROM test_data d
        CROSS APPLY OPENJSON(d.jsonString, '$.parsedPerson.people[0].outputPersonName.terms') j1
        CROSS APPLY
        OPENJSON(j1.Value, '$')
        WITH
        (
            string NVARCHAR(50) '$.string',
            termType NVARCHAR(20) '$.termType'
        ) j2;
    
    --MULTIPLE Person 2
        SELECT d.ID,
           d.fullName,
           j2.termType,
           j2.string
        FROM test_data d
        CROSS APPLY OPENJSON(d.jsonString, '$.parsedPerson.people[1].outputPersonName.terms') j1
        CROSS APPLY
        OPENJSON(j1.Value, '$')
        WITH
        (
            string NVARCHAR(50) '$.string',
            termType NVARCHAR(20) '$.termType'
        ) j2;
    

    Any simplifications and optimizations are welcome.