Search code examples
sqlsql-servert-sqltokenize

Can I tokenize a string using t-SQL


I was wondering if I have SQL Server 2008 table that was created like this:

CREATE TABLE tbl (id INT PRIMARY KEY, 
                  dvt NVARCHAR(32), 
                  d0 TINYINT, 
                  d1 TINYINT, 
                  d2 TINYINT);
INSERT INTO tbl (id, dvt, d0, d1, d2) 
 VALUES(1, '1', NULL, NULL, NULL);
INSERT INTO tbl (id, dvt, d0, d1, d2) 
 VALUES(2, '', NULL, NULL, NULL);
INSERT INTO tbl (id, dvt, d0, d1, d2) 
 VALUES(3, '2,5', NULL, NULL, NULL);
INSERT INTO tbl (id, dvt, d0, d1, d2) 
 VALUES(4, '13, 34, 45, 5', NULL, NULL, NULL);
INSERT INTO tbl (id, dvt, d0, d1, d2) 
 VALUES(5, '1,8, 10', NULL, NULL, NULL);

I need to take the string from the 'dvt' column and split it into 'd0', 'd1' and 'd2' columns. The 'dvt' value can be separated by commas.

I can do this using C# and a tokenization function but I was wondering if it's possible to do the same using SQL?

Columns BEFORE:

1, "1",             NULL, NULL, NULL
2, "",              NULL, NULL, NULL
3, "2,5",           NULL, NULL, NULL
4, "13, 34, 45, 5", NULL, NULL, NULL
5, "1,8, 10",       NULL, NULL, NULL

Columns AFTER:

1, "1",             1,    NULL, NULL
2, "",              NULL, NULL, NULL
3, "2,5",           2,    5,    NULL
4, "13, 34, 45, 5", 13,   34,   45  -- 5 is discarded
5, "1,8, 10",       1,    8,    10

Solution

  • The main problem with this type of code is re-use of calculations.

    SQL Server is good at caching results (If you type the exact same CHARINDEX() caluculation 5 times, it only calculates once and re-uses that result 4 times).

    That's little consolation for the poor coder who has to type or maintain that code though.

    SQL Server 2005 onward has CROSS APPLY that does help somewhat. The logic is repeated, but the results can be referenced repeatedly, rather that the calculation typed repeatedly.

    SELECT
      *,
      SUBSTRING(dvt, 1,            ISNULL(comma1.pos-1, LEN(dvt))           ) AS item1,
      SUBSTRING(dvt, comma1.pos+1, ISNULL(comma2.pos-1, LEN(dvt))-comma1.pos) AS item2,
      SUBSTRING(dvt, comma2.pos+1, ISNULL(comma3.pos-1, LEN(dvt))-comma2.pos) AS item3
    FROM
    (
      SELECT 'ab,c,def,hij' AS dvt
      UNION ALL
      SELECT 'xyz,abc'      AS dvt
    )
      AS data
    OUTER APPLY
      (SELECT NULLIF(CHARINDEX(',', data.dvt, 1           ), 0) AS pos                     )  AS comma1
    OUTER APPLY
      (SELECT NULLIF(CHARINDEX(',', data.dvt, comma1.pos+1), 0) AS pos WHERE comma1.pos > 0)  AS comma2
    OUTER APPLY
      (SELECT NULLIF(CHARINDEX(',', data.dvt, comma2.pos+1), 0) AS pos WHERE comma2.pos > 0)  AS comma3
    OUTER APPLY
      (SELECT NULLIF(CHARINDEX(',', data.dvt, comma3.pos+1), 0) AS pos WHERE comma3.pos > 0)  AS comma4
    


    Another option is to simply write a table valued user defined function that does this (even when the result of the function is always one row). Then you simply CROSS APPLY that function.