Using SQL Server 2022, I'm trying to figure out how to use STRING_SPLIT
in a set-based way to convert a table column where all rows have a variable number of key/value pairs into a table without duplicates, similar to T-SQL split string but not quite.
create table #mashed (mashedtext nvarchar(1600))
insert into #mashed
values ('"id":"one", "code":"aaa", "dev":"yes"')
insert into #mashed
values ('"id":"two", "code":"bbb", "dev":"yes"')
insert into #mashed
values ('"id":"three", "code":"ccc", "dev":"no"')
insert into #mashed
values ('"id":"three", "code":"bbb", "help":"no" , "rid":"6"')
Desired shape of output
key value
----------------
id one
id two
id three
code aaa
code bbb
code ccc
dev yes
dev no
help no
rid 6
This is clearly wrong:
SELECT value
FROM STRING_SPLIT (SELECT mashedtext FROM #mashed, ',')
So what is the right way to do this without RBAR? I am sure there are two splits needed, once to get the pairs and again on each pair.
Thanks.
The stored text data is almost a valid JSON object, so you may try to fix it and parse it with OPENJSON()
:
SELECT DISTINCT j.[key], j.[value]
FROM #mashed m
CROSS APPLY OPENJSON(CONCAT('{', m.mashedtext, '}')) j
If you want to use STRING_SPLIT()
you need to use the third optional parameter in STRING_SPLIT()
(introduced in SQL Server 2022) to get the ordinal position of the each substring:
SELECT DISTINCT s2.*
FROM #mashed m
CROSS APPLY STRING_SPLIT(m.mashedtext, ',') s1
CROSS APPLY (
SELECT
[key] = MAX (REPLACE(CASE WHEN [ordinal] = 1 THEN [value] END, '"', '')),
[value] = MAX (REPLACE(CASE WHEN [ordinal] = 2 THEN [value] END, '"', ''))
FROM STRING_SPLIT(TRIM(s1.[value]), ':', 1)
) s2
Result (without order):
key | value |
---|---|
code | aaa |
code | bbb |
code | ccc |
dev | no |
dev | yes |
help | no |
id | one |
id | three |
id | two |
rid | 6 |