Search code examples
t-sqlsplituser-defined-functionssql-server-2017

Need to split column into rows and columns


I have a table like this:

ID  cst
1   string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8,string9, 65
2   string10;-3;string11;string12;56;string13;6;string14;string15;9
etc.

Now I want to split the cst column into 5 columns and multiple rows. So like this:

ID  C1       C2  C3        C4        C5
1   string1   3  string2   string3   34
1   string4  -1  string5   string6   12
1   string7   5  string8   string9   65
2   string10 -3  string11  string12  56
2   string13  6  string14  string15   9
etc.

How to accomplish this? I am on SQL-server 2017, so I can use the string_split function. The problem with this function is that it produces only one output column...

Preferably I would like yo create an UDF that outputs a table. The function would use these input parameters: the string, the separator character, the number of columns. So the function can be used dynamically with a varying number of columns.

ps. the strings can be of variable length of course.


Solution

  • Try it along this:

    Hint: There are some "normal" commas in your sample data. I suspected these as wrong and used semicolons. If this is wrong, you might use a general REPLACE() to use ";" instead of ",".

    Create a declared table to simulate your issue

    DECLARE @tbl TABLE(ID INT, cst VARCHAR(1000));
    INSERT INTO @tbl(ID,cst) 
    VALUES(1,'string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8;string9; 65')
         ,(2,'string10;-3;string11;string12;56;string13;6;string14;string15;9');
    

    --The query (for almost any version of SQL-Server, find v2017+ as UPDATE below)

    WITH cte AS
    (
        SELECT t.ID
              ,B.Nr
              ,A.Casted.value('(/x[sql:column("B.Nr")]/text())[1]','varchar(max)') AS ValueAtPosition
              ,(B.Nr-1) % 5 AS Position
              ,(B.Nr-1)/5 AS GroupingKey
        FROM @tbl t
        CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.cst,';','</x><x>') + '</x>' AS XML)) A(Casted)
        CROSS APPLY(SELECT TOP(A.Casted.value('count(x)','int')) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) FROM master..spt_values) B(Nr)
    )
    SELECT ID
          ,GroupingKey
          ,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
          ,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
          ,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
          ,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
          ,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
    FROM cte
    GROUP BY ID,GroupingKey
    ORDER BY ID,GroupingKey;
    

    The idea in short:

    • we use APPLY to add your string casted to XML to the result set. This will help to split the string ("a;b;c" => <x>a</x><x>b</x><x>c</x>)
    • We use another APPLY to create a tally on the fly with a computed TOP-clause. It will return as many virtual rows as there are elements in the XML
    • We use sql:column() to grab each element's value by its position and some simple maths to create a grouping key and a running number from 0 to 4 and so on.
    • We use GROUP BY together with MAX(CASE...) to place the values in the fitting column (old-fashioned pivot or conditional aggregation).

    Hint: If you want this fully generically, with a number of columns not knwon in advance. You cannot use any kind of function or ad-hoc query. You would rather need some kind of dynamic statement creation together with EXEC within a stored procedure. to be honest: This might be a case of XY-problem. Such approaches are the wrong idea - at least in almost all situations I can think of.

    UPDATE for SQL-Server 2017+

    You are on v2017, this allows for JSON, which is a bit faster in position safe string splitting. Try this:

        SELECT t.ID
              ,A.*
        FROM @tbl t
        CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
    

    The general idea is the same. We transform a string to a JSON-array ("a,b,c" => ["a","b","c"]) and read it with APPLY OPENJSON(). You can perform the same maths at the "key" column and do the rest as above.

    Just because it is ready here, this is the full query for v2017+

    WITH cte AS
    (
        SELECT t.ID
              ,A.[key]+1 AS Nr
              ,A.[value] AS ValueAtPosition
              ,A.[key] % 5 AS Position
              ,A.[key]/5 AS GroupingKey 
        FROM @tbl t
        CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
    )
    SELECT ID
          ,GroupingKey
          ,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
          ,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
          ,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
          ,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
          ,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
    FROM cte
    GROUP BY ID,GroupingKey
    ORDER BY ID,GroupingKey;