Search code examples
sqlsql-servert-sqlsplitsql-server-2022

Split Variable Pairs into Columns?


Using SQL Server 2022, I'm trying to figure out how to use STRING_SPLIT in a set-based way to convert a table column where all rows have a variable number of key/value pairs into a table without duplicates, similar to T-SQL split string but not quite.

create table #mashed (mashedtext nvarchar(1600))

insert into #mashed 
values ('"id":"one", "code":"aaa", "dev":"yes"')
insert into #mashed 
values ('"id":"two", "code":"bbb", "dev":"yes"')
insert into #mashed 
values ('"id":"three", "code":"ccc", "dev":"no"')
insert into #mashed 
values ('"id":"three", "code":"bbb", "help":"no" , "rid":"6"')

Desired shape of output

   key    value
----------------
   id     one
   id     two
   id     three
   code   aaa
   code   bbb
   code   ccc
   dev    yes
   dev    no
   help   no
   rid    6

This is clearly wrong:

SELECT value 
FROM STRING_SPLIT (SELECT mashedtext FROM #mashed, ',') 

So what is the right way to do this without RBAR? I am sure there are two splits needed, once to get the pairs and again on each pair.

Thanks.


Solution

  • The stored text data is almost a valid JSON object, so you may try to fix it and parse it with OPENJSON():

    SELECT DISTINCT j.[key], j.[value]
    FROM #mashed m
    CROSS APPLY OPENJSON(CONCAT('{', m.mashedtext, '}')) j
    

    If you want to use STRING_SPLIT() you need to use the third optional parameter in STRING_SPLIT() (introduced in SQL Server 2022) to get the ordinal position of the each substring:

    SELECT DISTINCT s2.*
    FROM #mashed m
    CROSS APPLY STRING_SPLIT(m.mashedtext, ',') s1
    CROSS APPLY (
       SELECT
          [key] = MAX (REPLACE(CASE WHEN [ordinal] = 1 THEN [value] END, '"', '')),
          [value] = MAX (REPLACE(CASE WHEN [ordinal] = 2 THEN [value] END, '"', ''))
      FROM STRING_SPLIT(TRIM(s1.[value]), ':', 1)
    ) s2
    

    Result (without order):

    key value
    code aaa
    code bbb
    code ccc
    dev no
    dev yes
    help no
    id one
    id three
    id two
    rid 6