Search code examples
sqlsql-serversql-server-2017

Split one column with weird string into multiple columns by specific delimiter in single select sql


I am seeking/hoping for a simpler solution, although I got a working solution already.

But it is hard for me to accept, that this is the only way. Therefore my hope is, that someone who is a good sql poweruser may have a better idea.

Background:

A simple table looking like that:

weirdstring ID
A;GHL+BH;BC,NA-NB,[AB] 1
B;GHL+BH;BC,NA-NB,[AB] 2
C;GHL+BH;BC,NA-NB,[AB] 3
CREATE TABLE TESTTABLE (weirdstring varchar(MAX),
                        ID int);
INSERT INTO TESTTABLE
VALUES ('A;GHL+BH;BC,NA-NB,[AB]', 1);
INSERT INTO TESTTABLE
VALUES ('B;GHL+BH;BC,NA-NB,[AB]', 2);
INSERT INTO TESTTABLE
VALUES ('C;GHL+BH;BC,NA-NB,[AB]', 3);

All I need in the end is the first 3 "letter-groups" (1-3 letterst) from weirdstring (eg.ID 1 = A,GHL and BH, the rest of the string is not important now) in seperate columns:

ID weirdstring group1 group2 group3
1 A;GHL+BH;BC,NA-NB,[AB] A GHL BH
2 B;GHL+BH;BC,NA-NB,[AB] B GHL BH
3 C;GHL+BH;BC,NA-NB,[AB] C GHL BH

What have been done so far is:

  1. change all weird delimiters(;+- and potential more) in the string to comma, eliminate the brackets around "letter-groups". REPLACE daisy-chained is being used. So from A;GHL+BH;BC,NA-NB,[AB] to A,GHL,BH,BC,NA,NB,AB first.

  2. split the new string to columns by comma as delimiter.

The query used is:

SELECT t1.ID,
       t1.weirdstring,
       t2.group1,
       t2.group2,
       t2.group3
FROM TESTTABLE t1
     LEFT JOIN (SELECT grp1.ID,
                       grp1.weirdstring AS group1,
                       grp2.weirdstring AS group2,
                       grp3.weirdstring AS group3
                FROM (SELECT ID,
                             weirdstring
                      FROM (SELECT ID,
                                   weirdstring,
                                   ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
                            FROM (SELECT ID,
                                         value AS weirdstring
                                  FROM TESTTABLE
                                       CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
                                  WHERE weirdstring IS NOT NULL) splitted ) s1
                      WHERE ROWNUM = 1) grp1
                     LEFT JOIN (SELECT ID,
                                       weirdstring
                                FROM (SELECT ID,
                                             weirdstring,
                                             ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
                                      FROM (SELECT ID,
                                                   value AS weirdstring
                                            FROM TESTTABLE
                                                 CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
                                            WHERE weirdstring IS NOT NULL) splitted ) s2
                                WHERE ROWNUM = 2) grp2 ON grp1.ID = grp2.ID
                     LEFT JOIN (SELECT ID,
                                       weirdstring
                                FROM (SELECT ID,
                                             weirdstring,
                                             ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
                                      FROM (SELECT ID,
                                                   value AS weirdstring
                                            FROM TESTTABLE
                                                 CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
                                            WHERE weirdstring IS NOT NULL) splitted ) s3
                                WHERE ROWNUM = 3) grp3 ON grp3.ID = grp2.ID) t2 ON t1.ID = t2.ID;

But I could not believe how much of a query have been created in the end for my small task. At least I believe its small. I am on an older version (14) of sql-server and therefore I cannot use string_split with its third parameter (enable-ordinal) Syntax:

STRING_SPLIT ( string , separator [ , enable_ordinal ] )

Note: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver16 : The enable_ordinal argument and ordinal output column are currently supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only). Beginning with SQL Server 2022 (16.x) Preview, the argument and output column are available in SQL Server.

Is there some other, shorter ways to achieve the same results? I know that topic has been discussed many many times, but I could not find a solution to my specific problem here. Thanks in advance for any kind of help!


Solution

  • It seems that you are using SQL Server 2017 (v.14), so a possible option is the following JSON-based approach. The idea is to transform the stored text into a valid JSON array (A;GHL+BH;BC,NA-NB,[AB] into ["A","GHL","BH","BC","NA","NB","AB"]) using TRANSLATE() for character replacement and get the expected parts of the string using JSON_VALUE():

    SELECT
       weirdstring,
       JSON_VALUE(jsonweirdstring, '$[0]') AS group1,
       JSON_VALUE(jsonweirdstring, '$[1]') AS group2,
       JSON_VALUE(jsonweirdstring, '$[2]') AS group3
    FROM (  
       SELECT 
          weirdstring,
          CONCAT('["', REPLACE(TRANSLATE(weirdstring, ';+-,[]', '######'), '#', '","'), '"]') AS jsonweirdstring
       FROM TESTTABLE
    ) t