I have a table like this:
ID cst
1 string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8,string9, 65
2 string10;-3;string11;string12;56;string13;6;string14;string15;9
etc.
Now I want to split the cst column into 5 columns and multiple rows. So like this:
ID C1 C2 C3 C4 C5
1 string1 3 string2 string3 34
1 string4 -1 string5 string6 12
1 string7 5 string8 string9 65
2 string10 -3 string11 string12 56
2 string13 6 string14 string15 9
etc.
How to accomplish this? I am on SQL-server 2017, so I can use the string_split function. The problem with this function is that it produces only one output column...
Preferably I would like yo create an UDF that outputs a table. The function would use these input parameters: the string, the separator character, the number of columns. So the function can be used dynamically with a varying number of columns.
ps. the strings can be of variable length of course.
Try it along this:
Hint: There are some "normal" commas in your sample data. I suspected these as wrong and used semicolons. If this is wrong, you might use a general REPLACE() to use ";" instead of ",".
Create a declared table to simulate your issue
DECLARE @tbl TABLE(ID INT, cst VARCHAR(1000));
INSERT INTO @tbl(ID,cst)
VALUES(1,'string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8;string9; 65')
,(2,'string10;-3;string11;string12;56;string13;6;string14;string15;9');
--The query (for almost any version of SQL-Server, find v2017+ as UPDATE below)
WITH cte AS
(
SELECT t.ID
,B.Nr
,A.Casted.value('(/x[sql:column("B.Nr")]/text())[1]','varchar(max)') AS ValueAtPosition
,(B.Nr-1) % 5 AS Position
,(B.Nr-1)/5 AS GroupingKey
FROM @tbl t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.cst,';','</x><x>') + '</x>' AS XML)) A(Casted)
CROSS APPLY(SELECT TOP(A.Casted.value('count(x)','int')) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) FROM master..spt_values) B(Nr)
)
SELECT ID
,GroupingKey
,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;
The idea in short:
APPLY
to add your string casted to XML to the result set. This will help to split the string ("a;b;c" => <x>a</x><x>b</x><x>c</x>
)APPLY
to create a tally on the fly with a computed TOP
-clause. It will return as many virtual rows as there are elements in the XMLsql:column()
to grab each element's value by its position and some simple maths to create a grouping key and a running number from 0 to 4 and so on.GROUP BY
together with MAX(CASE...)
to place the values in the fitting column (old-fashioned pivot or conditional aggregation).Hint: If you want this fully generically, with a number of columns not knwon in advance. You cannot use any kind of function or ad-hoc query. You would rather need some kind of dynamic statement creation together with EXEC
within a stored procedure.
to be honest: This might be a case of XY-problem. Such approaches are the wrong idea - at least in almost all situations I can think of.
UPDATE for SQL-Server 2017+
You are on v2017, this allows for JSON, which is a bit faster in position safe string splitting. Try this:
SELECT t.ID
,A.*
FROM @tbl t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
The general idea is the same. We transform a string to a JSON-array ("a,b,c" => ["a","b","c"]) and read it with APPLY OPENJSON()
.
You can perform the same maths at the "key" column and do the rest as above.
Just because it is ready here, this is the full query for v2017+
WITH cte AS
(
SELECT t.ID
,A.[key]+1 AS Nr
,A.[value] AS ValueAtPosition
,A.[key] % 5 AS Position
,A.[key]/5 AS GroupingKey
FROM @tbl t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
)
SELECT ID
,GroupingKey
,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;