I am attempting to read multiple JSON files using Synapse from a data lake storage container. I can successfully parse an example file pre-loaded in the SQL script, but I want to iterate through multiple files in the data lake (using OPENROWSET) and present the results in a table.
So far, I have:
SELECT
iotdata.did as Device, iotdata.dt as DeviceType, iotdata.c as PayloadComment, iotdata.mid as ManufactureId, iotdata.t as PayloadTimeStamp,
ds.tp as DataType, ds.t as DataSetTimeStamp,
de.*
FROM
OPENROWSET(
BULK 'https://xxxx.dfs.core.windows.net/raw/100/100/xxxx/2023/11/10/*.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b',
ROWTERMINATOR = '0x0b'
)
WITH (
did nvarchar(100),
dt nvarchar(100),
c nvarchar(100),
mid nvarchar(100),
t nvarchar(100),
ds nvarchar(max)
) as iotdata
cross apply openjson (iotdata.ds)
with
(
tp nvarchar(100),
t nvarchar(100),
de nvarchar(max) as json
) as ds
cross apply openjson (ds.de)
with
(
ch int '$.ch',
m nvarchar(100)
)
as de
This produces error:
Statement ID: {251445DC-2D85-4044-BF2F-34FE862FA8BA} | Query hash: 0x4C7DD120748CDBE7 | Distributed request ID: {F813A165-E978-442F-B61B-704D72E754E1}. Total size of data scanned is 2 megabytes, total size of data moved is 0 megabytes, total size of data written is 0 megabytes. Msg 13812, Level 16, State 3, Line 1 Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (did) in https://xxxx.dfs.core.windows.net/raw/100/100/R00016/2023/11/10/xxx-Received-20231110080054-MID-2xxx.json.
What am I doing wrong?
Working SQL example, with a single JSON in embedded in the SQL
declare @json2 nvarchar(max)
set @json2 = '
{
"did": "0001",
"mid": "100",
"dt": 100,
"t": "2023-11-10T00:00:50.046Z",
"c": "comment",
"ds": [
{
"t": "2023-11-09T20:16:56.143Z",
"tp": "100",
"de": [
{
"ch": 100,
"m": "2023-11-09T20:16:36.037Z"
},
{
"ch": 101,
"m": "2023-11-09T20:16:56.143Z"
}
]
},
{
"t": "2023-11-10T00:00:12.158Z",
"tp": "101",
"de": [
{
"ch": 120,
"m": 13.189029
},
{
"ch": 121,
"m": 77.434937
}
]
},
{
"t": "2023-11-10T00:00:07.054Z",
"tp": "101",
"de": [
{
"ch": 120,
"m": 13.190003
},
{
"ch": 121,
"m": 77.44736
}
]
},
{
"t": "2023-11-09T22:48:42.351Z",
"tp": "100",
"de": [
{
"ch": 100,
"m": "2023-11-09T22:47:10.731Z"
},
{
"ch": 101,
"m": "2023-11-09T22:48:42.351Z"
}
]
},
{
"t": "2023-11-10T00:00:50.014Z",
"tp": "103",
"de": [
{
"ch": 145,
"m": 35.935687
},
{
"ch": 146,
"m": -96.062258
}
]
}
]
}';
select
iotdata.did as Device, iotdata.dt as DeviceType, iotdata.c as PayloadComment, iotdata.mid as ManufactureId, iotdata.t as PayloadTimeStamp,
ds.tp as DataType, ds.t as DataSetTimeStamp,
de.*
from openjson (@json2)
with
(
did nvarchar(100),
dt nvarchar(100),
c nvarchar(100),
mid nvarchar(100),
t nvarchar(100),
ds nvarchar(max) as json
)
as iotdata
cross apply openjson (iotdata.ds)
with
(
tp nvarchar(100),
t nvarchar(100),
de nvarchar(max) as json
) as ds
cross apply openjson (ds.de)
with
(
ch int '$.ch',
m nvarchar(100)
)
as de
Desired table output
╔════════╦════════════╦════════════════╦═══════════════╦══════════════════════════╦══════════╦══════════════════════════╦═════╦══════════════════════════╗
║ Device ║ DeviceType ║ PayloadComment ║ ManufactureId ║ PayloadTimeStamp ║ DataType ║ DataSetTimeStamp ║ ch ║ m ║
╠════════╬════════════╬════════════════╬═══════════════╬══════════════════════════╬══════════╬══════════════════════════╬═════╬══════════════════════════╣
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 100 ║ 2023-11-09T20:16:56.143Z ║ 100 ║ 2023-11-09T20:16:36.037Z ║
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 100 ║ 2023-11-09T20:16:56.143Z ║ 101 ║ 2023-11-09T20:16:56.143Z ║
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 101 ║ 2023-11-10T00:00:12.158Z ║ 120 ║ 13.189029 ║
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 101 ║ 2023-11-10T00:00:12.158Z ║ 121 ║ 77.434937 ║
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 101 ║ 2023-11-10T00:00:07.054Z ║ 120 ║ 13.190003 ║
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 101 ║ 2023-11-10T00:00:07.054Z ║ 121 ║ 77.44736 ║
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 100 ║ 2023-11-09T22:48:42.351Z ║ 100 ║ 2023-11-09T22:47:10.731Z ║
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 100 ║ 2023-11-09T22:48:42.351Z ║ 101 ║ 2023-11-09T22:48:42.351Z ║
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 103 ║ 2023-11-10T00:00:50.014Z ║ 145 ║ 35.935687 ║
║ 0001 ║ 100 ║ comment ║ 100 ║ 2023-11-10T00:00:50.046Z ║ 103 ║ 2023-11-10T00:00:50.014Z ║ 146 ║ -96.062258 ║
╚════════╩════════════╩════════════════╩═══════════════╩══════════════════════════╩══════════╩══════════════════════════╩═════╩══════════════════════════╝
A single example JSON file:
{
"did": "0000",
"mid": "100",
"dt": 100,
"t": "2023-11-09T12:00:54.055Z",
"c": "",
"ds": [
{
"t": "2023-11-09T12:00:07.508Z",
"tp": "101",
"de": [
{
"ch": 120,
"m": 13.133502
},
{
"ch": 121,
"m": 77.360396
},
{
"ch": 122,
"m": "2023-11-09T12:00:07.508Z"
},
{
"ch": 123,
"m": 26677
},
{
"ch": 124,
"m": 5099.845
},
{
"ch": 125,
"m": 1
},
{
"ch": 126,
"m": "A"
},
{
"ch": 127,
"m": 0.02119
},
{
"ch": 128,
"m": 0
},
{
"ch": 129,
"m": 60273
}
]
},
{
"t": "2023-11-09T12:00:54.018Z",
"tp": "103",
"de": [
{
"ch": 145,
"m": 35.935687
},
{
"ch": 146,
"m": -96.062258
}
]
},
{
"t": "2023-11-09T12:00:12.612Z",
"tp": "101",
"de": [
{
"ch": 120,
"m": 13.133502
},
{
"ch": 121,
"m": 77.347973
},
{
"ch": 122,
"m": "2023-11-09T12:00:12.612Z"
},
{
"ch": 123,
"m": 2
},
{
"ch": 124,
"m": 0.0
},
{
"ch": 125,
"m": 1
},
{
"ch": 126,
"m": "A"
},
{
"ch": 127,
"m": 0.025042
},
{
"ch": 128,
"m": 1
},
{
"ch": 129,
"m": 0
}
]
}
]
}
When Json object is stored in the variable json2
, you used openjson
to parse the Json variable json2
. If you want to read and parse the Json file, you need to use JSONVALUE
for parsing the Json value that are read with openrowset
. Below is the modified code.
Code:
SELECT
JSON_VALUE(ds,'$.did') as deviceid,JSON_VALUE(ds, '$.dt') as DeviceType, JSON_VALUE(ds,'$.c') as PayloadComment,
JSON_VALUE(ds,'$.mid') as ManufactureId, JSON_VALUE(ds,'$.t') as PayloadTimeStamp,
ds1.tp as [DataType], ds1.t as DataSetTimeStamp
,ch,m
FROM
OPENROWSET(
BULK 'https://xxx.blob.core.windows.net/xxx/xx/xxx.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b',
ROWTERMINATOR = '0x0b'
)
WITH (
ds nvarchar(max)
) as iotdata
cross apply openjson (ds,'$.ds')
with
(
tp nvarchar(100) '$.tp',
t nvarchar(100) '$.t',
de nvarchar(max) as json
) as ds1
cross apply openjson (de)
with
(
ch int '$.ch',
m nvarchar(100) '$.m'
) as de
Output:
Output data for the above code with sample Json file ,
deviceid | DeviceType | PayloadComment | ManufactureId | PayloadTimeStamp | DataType | DataSetTimeStamp | ch | m |
---|---|---|---|---|---|---|---|---|
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 100 | 2023-11-09T20:16:56.143Z | 100 | 2023-11-09T20:16:36.037Z |
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 100 | 2023-11-09T20:16:56.143Z | 101 | 2023-11-09T20:16:56.143Z |
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 101 | 2023-11-10T00:00:12.158Z | 120 | 13.189029 |
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 101 | 2023-11-10T00:00:12.158Z | 121 | 77.434937 |
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 101 | 2023-11-10T00:00:07.054Z | 120 | 13.190003 |
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 101 | 2023-11-10T00:00:07.054Z | 121 | 77.44736 |
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 100 | 2023-11-09T22:48:42.351Z | 100 | 2023-11-09T22:47:10.731Z |
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 100 | 2023-11-09T22:48:42.351Z | 101 | 2023-11-09T22:48:42.351Z |
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 103 | 2023-11-10T00:00:50.014Z | 145 | 35.935687 |
0001 | 100 | comment | 100 | 2023-11-10T00:00:50.046Z | 103 | 2023-11-10T00:00:50.014Z | 146 | -96.062258 |