Search code examples
sqljsonsql-serverazure-sql-databaseazure-data-factory

How to import JSON file with different values into a SQL table?


I got a JSON file being generated each day. It could have different structure depends on the source data (attributes could be more or less in each JSON file). For example: File 1 has attributes: productCode, productType, sellerName, sellerCountry File 2 has all above attributes plus sellerState next file could have more or less attributes than file 2

How could I flatten this kind of JSON file into a SQL table dynamically?

File1:

[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  }
]

File2:

[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  },
  {
    "productCode": "00003",
    "productType": "Toy",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "AUS",
        "sellerState": "VIC"
      }
    ]
  }
]

Solution

  • It is possible to do this dynamically using dynamic SQL generation, but in most cases that is not necessary.

    In JSON files it is common to exclude properties for individual entities if their value is null. This means that although it looks like the output is dynamic systems that generate these files will usually conform to a standard schema. In this case there is a maximal structure that is possible but if you do not know this schema you can determine it from comparing files.

    Once you have a fixed schema, we can read all files according to that schema and the missing properties will be imported with a null value.

    A complicating factor in this input is that the sellerDetails is a nested array, in SQL this is a new dimension or table, so we need to join on the results of extracting the root element properties which will flatten the data.

    This is described in How to use OPENJSON on multiple rows

    Let us therefore define the maximal schema assuming that sellerDetails has the 3 properties: sellerName, sellerCountry, sellerState. If you identify more fields, add them to the WITH declaration.

    This fiddle demonstrates a solution in MS SQL Server: http://sqlfiddle.com/#!18/7d5fd/2

    I have unioned the two inputs into the same resultset so you can compare the output

    SELECT DISTINCT product.ProductCode, product.productType, seller.*
    FROM
    (
      SELECT @json1 as json
      UNION 
      SELECT @json2
    ) input
    CROSS APPLY OPENJSON (json)
    WITH (
        productCode varchar(10), 
        productType varchar(50), 
        sellerDetails NVARCHAR(MAX) as JSON
    ) product
    OUTER APPLY OPENJSON (product.sellerDetails) 
    WITH (
      sellerName varchar(50),
      sellerCountry varchar(50),
      sellerState varchar(50)
    ) seller
    
    ProductCode productType sellerName sellerCountry sellerState
    00001 Food Cosco UK NULL
    00002 Clothing NULL NULL NULL
    00003 Toy Cosco AUS VIC

    Notice the code ignores the missing sellerState from the first file and injects NULL

    UPDATE:

    Thank you to @Charlieface for pointing out that we should consider using OUTER APPLY OPENJSON (product.sellerDetails) for nested OPENJSON calls. CROSS APPLY behaves like an INNER JOIN, so using cross apply here results in fintering out product records that did not have any sellerDetails records, like 'Clothing' in this example.

    DISTINCT is also added to this query to remove the duplicate entries, product 00002 Clothing for instance appears in both files.


    These are the variable declarations:

    DECLARE @json1 varchar(max) = '[
      {
        "productCode": "00001",
        "productType": "Food",
        "sellerDetails": [
          {
            "sellerName": "Cosco",
            "sellerCountry": "UK"
          }
        ]
      },
      {
        "productCode": "00002",
        "productType": "Clothing"
      }
    ]
    ';
    DECLARE @json2 varchar(max) = '[
      {
        "productCode": "00001",
        "productType": "Food",
        "sellerDetails": [
          {
            "sellerName": "Cosco",
            "sellerCountry": "UK"
          }
        ]
      },
      {
        "productCode": "00002",
        "productType": "Clothing"
      },
      {
        "productCode": "00003",
        "productType": "Toy",
        "sellerDetails": [
          {
            "sellerName": "Cosco",
            "sellerCountry": "AUS",
            "sellerState": "VIC"
          }
        ]
      }
    ]
    ';