Search code examples
jsonsql-servert-sqlsql-server-2016entity-attribute-value

Read any JSON into list of key-value pairs (EAV format) in SQL Server


Looking for a way to read any JSON without knowing the internals with SQL Server's JSON methods, I came up with an approach I'd like to share.

Here is the question bringing up this issue.

The question is: How can I transform an unknown JSON into a structured EAV format while keeping all the information about sort order and nest levels.

The ideal output should carry the original row's id as the entity, the Json's key and value as the attribute and value together with the JsonPath for the specific object in a sorted list.

Find the MCVE (sample data from the linked question) embedded into my self-answer.


Solution

  • First we create a declared table variable and fill it with some sample JSONs to simulate the issue (I added some arrays to the samples to reflect JSON paths for arrays):

    DECLARE @table TABLE(ID INT IDENTITY, AnyJSON NVARCHAR(MAX));
    INSERT INTO @table VALUES
    (N' {
        "correlationId": "c3xOeEEQQCCA9sEx7-u6FA",
        "eventCreateTime": "2020-05-12T15:38:23.717Z",
        "time": 1589297903717,
        "owner": {
            "ownergeography": {
                "city": "abc",
                "country": "abc"
            },
            "ownername": {
                "firstname": "abc",
                "lastname": "def"
            },
            "clientApiKey": "xxxxx",
            "businessProfileApiKey": null,
            "userId": null
        },
        "campaignType": "Mobile push"
    }')
    ,(N'[{
        "correlationIds": [
            {
                "campaignId": [1,2,3],
                "correlationId": [{"a":"b"},{"c":"d"},{"e":"f"}]
            }
        ],
        "variantId": 1278915,
        "utmCampaign": "",
        "ua.os.major": "8"
        }
        ,{
        "correlationIds": [
            {
                "campaignId": [1,2,3],
                "correlationId": [{"a":"b"},{"c":"d"},{"e":"f"}]
            }
        ],
        "variantId": 1278915,
        "utmCampaign": "",
        "ua.os.major": "8"
        }]')
    ,(N'{
        "correlationId": "ls7XmuuiThWzktUeewqgWg",
        "eventCreateTime": "2020-05-12T12:40:20.786Z",
        "time": 1589287220786,
        "modifiedBy": {
            "clientId": null,
            "clientApiKey": "xxx",
            "businessProfileApiKey": null,
            "userId": null
        },
        "campaignType": "Mobile push"
    }');
    

    --The query

    WITH recCTE AS
    (
        SELECT ID
              ,NestLevel   = 0 
              ,ObjectIndex = CAST(1 AS bigint)                                                          
              ,SortString  = CAST(N'sort'                       COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX)) 
    
              ,JsonPath    = CAST(N'$'                          COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
              ,JsonKey     = CAST(N'$'                          COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX)) 
              ,JsonValue   = CAST(AnyJSON                       COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX)) 
              ,JsonType    = CAST(CASE WHEN LEFT(TRIM(AnyJSON),1)=N'[' THEN 4 ELSE 0 END AS TINYINT)
              ,NestedJSON  = CAST(CASE WHEN ISJSON(AnyJSON)=1 
                                       THEN AnyJSON 
                                       ELSE NULL END            COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX)) 
        FROM @table t
    
        UNION ALL
    
        SELECT r.ID
              ,r.NestLevel+1
              ,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) 
              ,CAST(CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),5),' ','0')) COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
    
              ,CAST(CONCAT(r.JsonPath, CASE WHEN r.JsonType=4 --<-- see the docs for OPENJSON()
                                            THEN CONCAT('[',A.[key],']') 
                                            ELSE '.' + A.[key] END)                       COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
              ,CAST(A.[key]                                                               COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
              ,CAST(r.JsonValue                                                           COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
              ,A.[type] 
              ,CAST(A.[value]                                                             COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
        FROM recCTE r
        CROSS APPLY OPENJSON(r.NestedJSON) A
        WHERE ISJSON(r.NestedJSON)=1
    )
    SELECT ID
          ,NestLevel
          ,ObjectIndex
          ,JsonPath
          ,JsonKey
          ,NestedJSON AS JsonValue
          ,SortString --<-- just to illustrate the sorting, not needed in the output
    FROM recCTE 
    WHERE ISJSON(NestedJSON)=0
    ORDER BY ID,SortString;
    

    The result

    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | ID | JsonPath                                  | JsonKey         | JsonValue                | SortString                      |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 1  | $.correlationId                           | correlationId   | c3xOeEEQQCCA9sEx7-u6FA   | 0    1                          |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 1  | $.eventCreateTime                         | eventCreateTime | 2020-05-12T15:38:23.717Z | 0    2                          |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 1  | $.time                                    | time            | 1589297903717            | 0    3                          |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 1  | $.owner.ownergeography.city               | city            | abc                      | 0    4    1    1                |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 1  | $.owner.ownergeography.country            | country         | abc                      | 0    4    1    2                |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 1  | $.owner.ownername.firstname               | firstname       | abc                      | 0    4    2    1                |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 1  | $.owner.ownername.lastname                | lastname        | def                      | 0    4    2    2                |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 1  | $.owner.clientApiKey                      | clientApiKey    | xxxxx                    | 0    4    3                     |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 1  | $.campaignType                            | campaignType    | Mobile push              | 0    5                          |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[0].correlationIds[0].campaignId[0]      | 0               | 1                        | 0    1    1    1    1    1      |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[0].correlationIds[0].campaignId[1]      | 1               | 2                        | 0    1    1    1    1    2      |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[0].correlationIds[0].campaignId[2]      | 2               | 3                        | 0    1    1    1    1    3      |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[0].correlationIds[0].correlationId[0].a | a               | b                        | 0    1    1    1    2    1    1 |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[0].correlationIds[0].correlationId[1].c | c               | d                        | 0    1    1    1    2    2    1 |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[0].correlationIds[0].correlationId[2].e | e               | f                        | 0    1    1    1    2    3    1 |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[0].variantId                            | variantId       | 1278915                  | 0    1    2                     |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[0].utmCampaign                          | utmCampaign     |                          | 0    1    3                     |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[0].ua.os.major                          | ua.os.major     | 8                        | 0    1    4                     |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[1].correlationIds[0].campaignId[0]      | 0               | 1                        | 0    2    1    1    1    1      |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[1].correlationIds[0].campaignId[1]      | 1               | 2                        | 0    2    1    1    1    2      |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[1].correlationIds[0].campaignId[2]      | 2               | 3                        | 0    2    1    1    1    3      |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[1].correlationIds[0].correlationId[0].a | a               | b                        | 0    2    1    1    2    1    1 |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[1].correlationIds[0].correlationId[1].c | c               | d                        | 0    2    1    1    2    2    1 |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[1].correlationIds[0].correlationId[2].e | e               | f                        | 0    2    1    1    2    3    1 |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[1].variantId                            | variantId       | 1278915                  | 0    2    2                     |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[1].utmCampaign                          | utmCampaign     |                          | 0    2    3                     |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 2  | $[1].ua.os.major                          | ua.os.major     | 8                        | 0    2    4                     |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 3  | $.correlationId                           | correlationId   | ls7XmuuiThWzktUeewqgWg   | 0    1                          |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 3  | $.eventCreateTime                         | eventCreateTime | 2020-05-12T12:40:20.786Z | 0    2                          |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 3  | $.time                                    | time            | 1589287220786            | 0    3                          |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 3  | $.modifiedBy.clientApiKey                 | clientApiKey    | xxx                      | 0    4    2                     |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    | 3  | $.campaignType                            | campaignType    | Mobile push              | 0    5                          |
    +----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
    

    The idea in short:

    • we use a recursive CTE to walk this down.
    • The query will test any fragment ([value] coming from OPENJSON) for being valid JSON.
    • If the fragment is valid, this walks deeper and deeper.
    • The column SortString is needed to get a final sort order.
    • The CAST() and COLLATE helps to avoid data type mismatch. Recursive CTEs are very picky with this...

    Hint: If you deal with bigger JSONs you might need to set OPTION (MAXRECURSION 0) at the end of your query.

    Enjoy :-)

    Something similar for XML

    Here is a similar answer about how to read an unknown XML.