sql json sql-server azure-sql-database azure-data-factory

How to import JSON file with different values into a SQL table?

I got a JSON file being generated each day. It could have different structure depends on the source data (attributes could be more or less in each JSON file). For example: File 1 has attributes: productCode, productType, sellerName, sellerCountry File 2 has all above attributes plus sellerState next file could have more or less attributes than file 2

How could I flatten this kind of JSON file into a SQL table dynamically?

File1:

[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  }
]

File2:

[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  },
  {
    "productCode": "00003",
    "productType": "Toy",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "AUS",
        "sellerState": "VIC"
      }
    ]
  }
]

Solution

It is possible to do this dynamically using dynamic SQL generation, but in most cases that is not necessary.

In JSON files it is common to exclude properties for individual entities if their value is null. This means that although it looks like the output is dynamic systems that generate these files will usually conform to a standard schema. In this case there is a maximal structure that is possible but if you do not know this schema you can determine it from comparing files.

Once you have a fixed schema, we can read all files according to that schema and the missing properties will be imported with a null value.

A complicating factor in this input is that the sellerDetails is a nested array, in SQL this is a new dimension or table, so we need to join on the results of extracting the root element properties which will flatten the data.

This is described in How to use OPENJSON on multiple rows

Let us therefore define the maximal schema assuming that sellerDetails has the 3 properties: sellerName, sellerCountry, sellerState. If you identify more fields, add them to the WITH declaration.

This fiddle demonstrates a solution in MS SQL Server: http://sqlfiddle.com/#!18/7d5fd/2

I have unioned the two inputs into the same resultset so you can compare the output

SELECT DISTINCT product.ProductCode, product.productType, seller.*
FROM
(
  SELECT @json1 as json
  UNION 
  SELECT @json2
) input
CROSS APPLY OPENJSON (json)
WITH (
    productCode varchar(10), 
    productType varchar(50), 
    sellerDetails NVARCHAR(MAX) as JSON
) product
OUTER APPLY OPENJSON (product.sellerDetails) 
WITH (
  sellerName varchar(50),
  sellerCountry varchar(50),
  sellerState varchar(50)
) seller

ProductCode	productType	sellerName	sellerCountry	sellerState
00001	Food	Cosco	UK	NULL
00002	Clothing	NULL	NULL	NULL
00003	Toy	Cosco	AUS	VIC

Notice the code ignores the missing sellerState from the first file and injects NULL

UPDATE:

Thank you to @Charlieface for pointing out that we should consider using OUTER APPLY OPENJSON (product.sellerDetails) for nested OPENJSON calls. CROSS APPLY behaves like an INNER JOIN, so using cross apply here results in fintering out product records that did not have any sellerDetails records, like 'Clothing' in this example.

DISTINCT is also added to this query to remove the duplicate entries, product 00002 Clothing for instance appears in both files.

These are the variable declarations:

DECLARE @json1 varchar(max) = '[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  }
]
';
DECLARE @json2 varchar(max) = '[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  },
  {
    "productCode": "00003",
    "productType": "Toy",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "AUS",
        "sellerState": "VIC"
      }
    ]
  }
]
';