Search code examples
sql-serversql-server-2012

Most efficient way to split string into rows


I am using the following function to split a string into rows. It is much faster than the previous function that I was using, however I need to somehow churn through this data quicker (its an ETL job):

ALTER FUNCTION [dbo].[ArrayToTable]
(
     @InputString VARCHAR(MAX) = ''
     , @Delimitter VARCHAR(1) = ',' 
) 
RETURNS @RESULT TABLE([Position] INT IDENTITY, [Value] VARCHAR(MAX)) 
AS 
BEGIN 
    DECLARE @XML XML 
    SELECT @XML = CONVERT(XML, SQL_TEXT) 
    FROM ( 
        SELECT '<root><item>' 
            + REPLACE(@InputString, @Delimitter, '</item><item>') 
            + '</item></root>' AS SQL_TEXT 
        ) dt 

    INSERT INTO @RESULT([Value]) 
    SELECT t.col.query('.').value('.', 'VARCHAR(1000)') AS [Value] 
    FROM @XML.nodes('root/item') t(col) 
    RETURN 
END 

Can anyone think of a better/quicker way to turn a delimited string into rows? I am using a cross apply on my query to join to these results.

Can anyone think of a more efficient method to turn delimited strings into rows?


Solution

  • Here's the most performant function I have:

    CREATE FUNCTION [Resource].[udf_SplitByXml]
          (@Data NVARCHAR(MAX), @Delimiter NVARCHAR(5))
    RETURNS @Table TABLE 
        ( Data NVARCHAR(MAX)
        , SequentialOrder INT IDENTITY(1, 1))
    AS
    BEGIN
    
        DECLARE @TextXml XML;
        SELECT @TextXml = CAST('<d>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@Data, '&', '&amp;'), '<', '&lt;'), '>', '&gt;'), '"', '&quot;'), '''', '&apos;'), @Delimiter, '</d><d>') + '</d>' AS XML);
    
        INSERT INTO @Table (Data)
        SELECT Data = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(RTRIM(LTRIM(T.split.value('.', 'nvarchar(max)'))), '&amp;', '&'), '&lt;', '<'), '&gt;', '>'), '&quot;', '"'), '&apos;', '''')
        FROM @TextXml.nodes('/d') T(Split)
    
        RETURN
    END
    

    Here are sample calls you can use to test the results:

    SELECT * FROM Resource.udf_SplitByXml('yes, no, maybe, so', ',');
    SELECT * FROM Resource.udf_SplitByXml('who|what|where|when|why|how|Uh, I don''t know!', '|');
    SELECT * FROM Resource.udf_SplitByXml('Government, Education, Non-profit|Energy & Power|Yes|No', '|');
    SELECT * FROM Resource.udf_SplitByXml('Energy & Power|Some<Thing>Wicked''This"Way Comes', '|');
    

    Another option is to try out the CLR solution based on Adam Machanic's code that was the winner of a performance test in this blog.