Search code examples
sqlsql-serversql-server-2016

Splitting a string into n segments of equal length m


I wonder if there is a way, just using a select statement, without involving loop/for/while control statements or creating any function in SQL Server, to split a string into n segments of equal length m?

For example, the string s looks like this:

s='AaaaaBbbbbCccccDdddd' with n=4 and m=5

n and m are known constants and would not change in the context

I would like the result looking like this:

Aaaaa
Bbbbb
Ccccc
Ddddd

Of course, this could be easily done by creating a function with a loop control.

I am just curious if it is possible to get the same results with just a select statement?


Solution

  • I would, personally, use a tally for this. An rCTE works, however, if you are working with larger amounts of data, or (much) longer strings, then the recursiveness of an rCTE will impact performance (I discuss this further here).

    As you state you can't create a function, I use an inline tally instead (which admittedly makes things look more complicated), to put the string into relevant segments. If the string doesn't fit the split evenly, the last value will be shorter.

    DECLARE @s varchar(MAX),
            @m int;
    
    SET @s = 'AaaaaBbbbbCccccDdddd';
    SET @m = 5;
    
    --Data set of NULLs
    WITH N AS(
        SELECT N
        FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
    --Build the Tally
    Tally AS(
        SELECT TOP(LEN(@s)) --Limit to length of the string
               ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
        FROM N N1, N N2, N N3, N N4, N N5, N N6, N N7) --Up to 10,000,000 rows
    SELECT SUBSTRING(@s, T.I, @m) AS sp
    FROM Tally T
    WHERE (T.I - 1) % @m = 0
    ORDER BY T.I;
    

    If this is against a table, it may look something like this:

    CREATE TABLE dbo.YourTable (ArbitraryID int IDENTITY,
                                YourString varchar(MAX),
                                SegmentSize int);
    INSERT INTO dbo.YourTable (YourString, SegmentSize)
    VALUES('AaaaaBbbbbCccccDdddd',5),
          ('1234567890',3);
    GO
    
    WITH N AS(
        SELECT N
        FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
    Tally AS(
        SELECT TOP(SELECT MAX(LEN(YourString)) FROM dbo.YourTable)
               ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
        FROM N N1, N N2, N N3, N N4, N N5, N N6, N N7)
    SELECT YT.ArbitraryID,
           SUBSTRING(YT.YourString, T.I, YT.SegmentSize) AS sp
    FROM dbo.YourTable YT
         JOIN Tally T ON LEN(YT.YourString) >= T.I
    WHERE (T.I - 1) % YT.SegmentSize = 0
    ORDER BY YT.ArbitraryID,
             T.I;
    
    GO
    --Clean up
    DROP TABLE dbo.YourTable;
    

    If you are on 2022+, then GENERATE_SERIES significantly simplifies this:

    DECLARE @s varchar(MAX),
            @m bigint; --bigint as we're using a MAX length string. If you aren't, use int
    
    SET @s = '1234567890';
    SET @m = 3;
    
    SELECT SUBSTRING(@s, GS.value, @m)
    FROM GENERATE_SERIES(CONVERT(bigint,1),LEN(@s), @m) GS --All values have to be the same data type, hence explicit CONVERT on the first parameter
    ORDER BY GS.value;
    GO
    -- Or with the prior table
    
    SELECT YT.ArbitraryID,
           SUBSTRING(YT.YourString, GS.value, YT.SegmentSize)
    FROM dbo.YourTable YT
         CROSS APPLY GENERATE_SERIES(CONVERT(bigint,1), LEN(YT.YourString), CONVERT(bigint,YT.SegmentSize)) GS --All values have to be the same data type, hence explicit CONVERTs
    ORDER BY YT.ArbitraryID,
             GS.value;