I am using the following function to split a string into rows. It is much faster than the previous function that I was using, however I need to somehow churn through this data quicker (its an ETL job):
ALTER FUNCTION [dbo].[ArrayToTable]
(
@InputString VARCHAR(MAX) = ''
, @Delimitter VARCHAR(1) = ','
)
RETURNS @RESULT TABLE([Position] INT IDENTITY, [Value] VARCHAR(MAX))
AS
BEGIN
DECLARE @XML XML
SELECT @XML = CONVERT(XML, SQL_TEXT)
FROM (
SELECT '<root><item>'
+ REPLACE(@InputString, @Delimitter, '</item><item>')
+ '</item></root>' AS SQL_TEXT
) dt
INSERT INTO @RESULT([Value])
SELECT t.col.query('.').value('.', 'VARCHAR(1000)') AS [Value]
FROM @XML.nodes('root/item') t(col)
RETURN
END
Can anyone think of a better/quicker way to turn a delimited string into rows? I am using a cross apply
on my query to join to these results.
Can anyone think of a more efficient method to turn delimited strings into rows?
Here's the most performant function I have:
CREATE FUNCTION [Resource].[udf_SplitByXml]
(@Data NVARCHAR(MAX), @Delimiter NVARCHAR(5))
RETURNS @Table TABLE
( Data NVARCHAR(MAX)
, SequentialOrder INT IDENTITY(1, 1))
AS
BEGIN
DECLARE @TextXml XML;
SELECT @TextXml = CAST('<d>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@Data, '&', '&'), '<', '<'), '>', '>'), '"', '"'), '''', '''), @Delimiter, '</d><d>') + '</d>' AS XML);
INSERT INTO @Table (Data)
SELECT Data = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(RTRIM(LTRIM(T.split.value('.', 'nvarchar(max)'))), '&', '&'), '<', '<'), '>', '>'), '"', '"'), ''', '''')
FROM @TextXml.nodes('/d') T(Split)
RETURN
END
Here are sample calls you can use to test the results:
SELECT * FROM Resource.udf_SplitByXml('yes, no, maybe, so', ',');
SELECT * FROM Resource.udf_SplitByXml('who|what|where|when|why|how|Uh, I don''t know!', '|');
SELECT * FROM Resource.udf_SplitByXml('Government, Education, Non-profit|Energy & Power|Yes|No', '|');
SELECT * FROM Resource.udf_SplitByXml('Energy & Power|Some<Thing>Wicked''This"Way Comes', '|');
Another option is to try out the CLR solution based on Adam Machanic's code that was the winner of a performance test in this blog.