I'm using MSSQL 2017 and have the JSON outputs of a name parsing service stored in a table:
CREATE TABLE test_data (
ID int,
fullName nvarchar(50),
jsonString nvarchar(max)
)
INSERT INTO test_data
(ID, fullName, jsonString)
VALUES
(1, N'Dr. Damian Summer', N'{"parsedPerson": {"personType": "NATURAL","personRole": "PRIMARY","mailingPersonRoles": ["ADDRESSEE"],"gender": {"gender": "MALE","confidence": 0.8199955555555556},"addressingGivenName": "Damian","addressingSurname": "Summer","outputPersonName": {"terms": [{"string": "Dr.","termType": "TITLE"},{"string": "Damian","termType": "GIVENNAME"},{"string": "Summer","termType": "SURNAME"}]}},"parserDisputes": [],"likeliness": 0.9480247866561989,"confidence": 0.9451463934269034}'),
(2, N'Rick Da Costa + M. Tosh', N'{"parsedPerson":{"personType":"MULTIPLE","personRole":"PRIMARY","mailingPersonRoles":["ADDRESSEE"],"people":[{"personType":"NATURAL","personRole":"PRIMARY","mailingPersonRoles":["ADDRESSEE"],"gender":{"gender":"MALE","confidence":1.0},"addressingGivenName":"Rick","addressingSurname":"Da Costa","outputPersonName":{"terms":[{"string":"Rick","termType":"GIVENNAME"},{"string":"Da Costa","termType":"SURNAME"}]}},{"personType":"NATURAL","personRole":"PRIMARY","mailingPersonRoles":["ADDRESSEE"],"gender":{"gender": "UNKNOWN","confidence": 0.6400000000000001},"addressingSurname":"Tosh","outputPersonName":{"terms":[{"string": "M.","termType":"GIVENNAMEINITIAL"},{"string":"Tosh","termType":"SURNAME"}]}}]},"parserDisputes":[],"likeliness": 0.6503418684197658,"confidence": 0.8653094883324904}'),
(3, N'Cygan Marianne + Edgar', N'{"parsedPerson":{"personType":"MULTIPLE","personRole":"PRIMARY","mailingPersonRoles":["ADDRESSEE"],"people":[{"personType":"NATURAL","personRole":"MEMBER","mailingPersonRoles":["MEMBER"],"gender":{"gender":"FEMALE","confidence":0.94667621434538729},"addressingGivenName":"Marianne","addressingSurname":"Cygan","outputPersonName":{"terms":[{"string":"Cygan","termType":"SURNAME"},{"string":"Marianne","termType":"GIVENNAME"}]}},{"personType":"NATURAL","personRole":"MEMBER","mailingPersonRoles":["MEMBER"],"gender":{"gender":"MALE","confidence":0.95406313536650611},"addressingGivenName":"Edgar","addressingSurname":"Cygan","outputPersonName":{"terms":[{"string":"Cygan","termType":"SURNAME"},{"string":"Edgar","termType":"GIVENNAME"}]}}]},"parserDisputes":[],"likeliness":0.8933697196920577,"confidence":0.96231315577078291}')
A record can contain one person (personType = NATURAL) or multiple persons (personType = MULTIPLE) and for each person found, the name is split into components (title, givenname, surname, ...) which are saved in the "terms" array. I want to parse this array for both person types and for every person found.
The result should look like this:
-----------------------------------------------------------------------------------------
ID fullName person termType string
-----------------------------------------------------------------------------------------
1 Dr. Damian Summer TITLE Dr.
1 Dr. Damian Summer GIVENNAME Damian
1 Dr. Damian Summer SURNAME Summer
2 Rick Da Costa + M. Tosh 1 GIVENNAME Rick
2 Rick Da Costa + M. Tosh 1 SURNAME Da Costa
2 Rick Da Costa + M. Tosh 2 GIVENNAMEINITIAL M.
2 Rick Da Costa + M. Tosh 2 SURNAME Tosh
3 Cygan Marianne + Edgar 1 GIVENNAME Marianne
3 Cygan Marianne + Edgar 1 SURNAME Cygan
3 Cygan Marianne + Edgar 2 GIVENNAME Edgar
3 Cygan Marianne + Edgar 2 SURNAME Cygan
How can I get all the values of "termType" and "string" as well as the number of the person? I have made a few attempts with OPENJSON and cross apply, but without success.
Is there a way to parse both personTypes at the same time? Or do I have to do one pass for NATURAL and one for MULTIPLE?
Ok, I found a solution, though it may not be the most elegant one. However, this requires three passes, one for personType NATURAL and one for each person in personType MULTIPLE:
--NATURAL
SELECT d.ID,
d.fullName,
j2.termType,
j2.string
FROM test_data d
CROSS APPLY OPENJSON(d.jsonString, '$.parsedPerson.outputPersonName.terms') j1
CROSS APPLY
OPENJSON(j1.Value, '$')
WITH
(
string NVARCHAR(50) '$.string',
termType NVARCHAR(20) '$.termType'
) j2;
--MULTIPLE Person 1
SELECT d.ID,
d.fullName,
j2.termType,
j2.string
FROM test_data d
CROSS APPLY OPENJSON(d.jsonString, '$.parsedPerson.people[0].outputPersonName.terms') j1
CROSS APPLY
OPENJSON(j1.Value, '$')
WITH
(
string NVARCHAR(50) '$.string',
termType NVARCHAR(20) '$.termType'
) j2;
--MULTIPLE Person 2
SELECT d.ID,
d.fullName,
j2.termType,
j2.string
FROM test_data d
CROSS APPLY OPENJSON(d.jsonString, '$.parsedPerson.people[1].outputPersonName.terms') j1
CROSS APPLY
OPENJSON(j1.Value, '$')
WITH
(
string NVARCHAR(50) '$.string',
termType NVARCHAR(20) '$.termType'
) j2;
Any simplifications and optimizations are welcome.