Search code examples
sqlsql-serverregexsql-server-2014charindex

How to use RegEx in the SQL function CHARINDEX to find the nth occurrence


The intent of the following algorithm is to extract certain set of fields from a key (in this example, extract first 2 fields), and it works. The fields are delimited by a colon:

declare @key nvarchar (max);
declare @pos int;
declare @fields nvarchar (max);

set @key = 'Field-1:Field-2:Field-3:Field-4:Field-5';
set @pos = charindex(':', @key, charindex (':', @key) + 1);
set @fields = left(@key, @pos - 1);

select @fields;

Result: Field-1:Field-2

Microsoft document says that the first parameter is an expression, but I think what they mean by that in the context of CHARINDEX is, that this expression should evaluate to a string literal; hence the following attempt to pass a RegEx to get the 2nd occurrence doesn’t work; obviously either it is not supported or I am using a bad syntax:

--match the second occurrence of the delimiter using RegEx
set @pos = charindex (':.*?(:)', @key);

In other words, is it possible to find the position of the nth occurrence of the delimiter in a given text using RegEx, so that I could avoid several nested CHARINDEX or a loop to parse? Keeping aside, if n is passed as a parameter, then I can't even use static nesting anymore...

Thanks in advance for the help.

Environment: Microsoft SQL Server 2014 (SP3) Standard Edition (64-bit)


Solution

  • There is no easy way, only tricks, to extract nth substring of a string. Below is a set based, recursive CTE approach:

    DECLARE @str NVARCHAR(MAX) = N'Field-1:Field-2:Field-3:Field-4:Field-5';
    DECLARE @num INT = 4;
    
    WITH rcte AS (
        SELECT str = @str
             , n = 1
             , p = CHARINDEX(':', @str, 1)
        UNION ALL
        SELECT str
             , n + 1
             , CHARINDEX(':', str, p + 1)
        FROM rcte
        WHERE n < @num AND p > 0
    )
    SELECT CASE WHEN p > 0 THEN SUBSTRING(str, 1, p - 1) ELSE str END
    FROM rcte
    WHERE n = @num;
    

    If loop is an option then:

    DECLARE @str NVARCHAR(MAX) = N'Field-1:Field-2:Field-3:Field-4:Field-5';
    DECLARE @num INT = 4;
    DECLARE @n INT = 0;
    DECLARE @p INT = 0;
    
    WHILE 1 = 1
    BEGIN
        SET @n = @n + 1;
        SET @p = CHARINDEX(':', @str, @p + 1);
        IF @n = @num OR @p = 0 BREAK;
    END;
    
    SELECT CASE WHEN @p > 0 THEN SUBSTRING(@str, 1, @p - 1) ELSE @str END;
    

    DB<>Fiddle