Search code examples
sqlsql-servercross-applysplit-function

SQL Server: split record


I have a table like this:

account   |   check1          |   check2
1         |   100]200]300     |   101]209]305
2         |   401]502         |   404]511
3         |   600             |   601

I want to separate the records into something like this:

account   |   check1     |   check2
1         |   100        |   101
1         |   200        |   209
1         |   300        |   305
2         |   401        |   404
2         |   502        |   511
.         |     .        |    .
.         |     .        |    .
.         |     .        |    .

How do I do this using SQL server only?

Thanks,


Solution

  • First, you need a split function that can allow you to determine order within the result. This is a multi-statement TVF which uses an IDENTITY column

    CREATE FUNCTION dbo.SplitStrings
    (
        @List       NVARCHAR(MAX),
        @Delimiter  NVARCHAR(255)
    )
    RETURNS @t TABLE(ID INT IDENTITY(1,1), Item INT)
    AS
    BEGIN
        INSERT @t(Item) SELECT SUBSTRING(@List, Number, 
            CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)
        FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
            FROM sys.all_objects) AS n(Number)
        WHERE Number <= CONVERT(INT, LEN(@List))
            AND SUBSTRING(@Delimiter + @List, Number, 1) = @Delimiter
        ORDER BY Number OPTION (MAXDOP 1);
    
        RETURN;
    END
    GO
    

    (If you have a Numbers table, you can use that instead of the subquery, and this will also allow you to add WITH SCHEMABINDING to the function's definition, which provides potential performance benefits.)

    With the function in place, here is sample usage given the data you've provided and desired results:

    DECLARE @x TABLE(account INT, check1 NVARCHAR(1000), check2 NVARCHAR(1000));
    
    INSERT @x SELECT 1, '100]200]300','101]209]305'
    UNION ALL SELECT 2, '401]502','404]511'
    UNION ALL SELECT 3, '600','601'
    UNION ALL SELECT 4, '205]104','304]701'; -- I added this sanity check
    
    SELECT account, check1 = s1.Item, check2 = s2.Item
    FROM @x AS x
    CROSS APPLY dbo.SplitStrings(x.check1, ']') AS s1
    CROSS APPLY dbo.SplitStrings(x.check2, ']') AS s2
    WHERE s1.ID = s2.ID
    ORDER BY account, s1.ID;
    

    Results:

    account  check1  check2
    -------  ------  ------
    1        100     101
    1        200     209
    1        300     305
    2        401     404
    2        502     511
    3        600     601
    4        205     304
    4        104     701
    

    This assumes that you have some kind of validation / enforcement that corresponding values in check1 and check2 columns will always have the same number of values. It also assumes any check1 / check2 value will not exceed about 7,000 characters (again a Numbers table can help make that more flexible).

    EDIT

    After AndriyM's comments I wanted to come back and re-visit this, mostly to supply a version of the above function which works without using a multi-statement TVF. This uses Andriy's idea ROW_NUMBER() could be used.

    CREATE FUNCTION dbo.SplitStrings
    (
        @List       NVARCHAR(MAX),
        @Delimiter  NVARCHAR(255)
    )
    RETURNS TABLE
    AS
        RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
            Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@List, Number, 
            CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
        FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
            FROM sys.all_objects) AS n(Number)
        WHERE Number <= CONVERT(INT, LEN(@List))
            AND SUBSTRING(@Delimiter + @List, Number, 1) = @Delimiter
        ) AS y);
    GO