Search code examples
sqlt-sqlsubstringquery-optimization

Lookup every character in string and replace with character from another string


I have a table with two string type columns named IL and CL. I have to compare both the strings character by character and wherever there is a question mark in column CL string, I have to replace it with the character found in the other column (IL), in the exact same position.

For example, the CL string after comparing with IL and replacing the question marks would be the New CL as shown below

CL IL NEW CL
???-123201-000000-000-??? 104-234561-644221-123-947 104-123201-000000-000-947

I have the below code working, where there is a while loop inside of a cursor going through each character of the string, concatenating into a new string, and then doing an update at the end with the new value. However, performance with this piece of code is horribly slow as the table has 100K+ records and it's looping through each row 25 time (string length). I am looking to see if perhaps there is a way to rewrite this logic set based for performance improvement.

declare CharC insensitive cursor for 
  select ID, IL, CL
  from AcctStrings
  where CL like '%?%' 

open CharC
fetch next from CharC into @ID, @IL, @CL
while @@fetch_status = 0 begin

    set @NewCL = ''
    set @i = 1 

    while @i <= 25 begin
      set @TestChar = substring(@CL,@i,1)
      set @OtherChar = substring(@IL,@i,1)

      if (@TestChar = '?') begin
        set @NewCL = @NewCL + @OtherChar 
      end 
      else 
        set @NewCL = @NewCL + @TestChar

      set @i = @i + 1
    end

    update AcctStrings
      set CL = @NewCL
    where ID = @ID
  end

  fetch next from CharC into @ID, @IL, @CL
end
deallocate CharC

Solution

  • If using SQL Server 2022, you can try the following:

    UPDATE AcctStrings
    SET CL = (
        SELECT STRING_AGG(N.NewC, '') WITHIN GROUP(ORDER BY S.Value)
        FROM GENERATE_SERIES(1, LEN(CL)) S
        CROSS APPLY(SELECT SUBSTRING(CL, S.value, 1) AS C) O
        CROSS APPLY(SELECT CASE WHEN O.C = '?' THEN SUBSTRING(IL, S.value, 1) ELSE O.C END AS NewC) N
    )
    WHERE CL LIKE '%?%'
    

    The above uses a subquery to break the CL string down into individual characters, apply the transform (when needed), and then reconstruct the updated string.

    See this db<>fiddle for a demo.

    Results (with some additional test data):

    CL IL NewCL
    ???-123201-000000-000-??? 104-234561-644221-123-947 104-123201-000000-000-947
    111-222222-333333-444-555 xxx-xxxxxx-xxxxxx-xxx-xxx null
    ????????????????????????? 123-654321-123456-456-789 123-654321-123456-456-789

    The null value indicates that no NewCL value was calculated, because the original value contained no "?"s.

    Addendum:

    For SQL Server 2019, the GENERATE_SERIES() function is unavailable, but there are several alterative techniques available for generating a number sequence - A VALUES subselect, ROW_NUMBER() applied to a sufficiently large row source, or a recursive CTE (Common Table Expression).

    The following uses a recursive CTE to generate a number sequence, that is then limited to the CL string length in the WHERE clause. The remaining logic is the same.

    ;WITH Series AS (
        SELECT 1 AS Value
        UNION ALL
        SELECT S.Value + 1 AS Value
        FROM Series S
        WHERE S.Value < 99 -- At least the maximum expected length of CL
    )
    UPDATE AcctStrings
    SET CL = (
        SELECT STRING_AGG(N.NewC, '') WITHIN GROUP(ORDER BY S.Value)
        FROM Series S
        CROSS APPLY(SELECT SUBSTRING(CL, S.value, 1) AS C) O
        CROSS APPLY(SELECT CASE WHEN O.C = '?' THEN SUBSTRING(IL, S.value, 1) ELSE O.C END AS NewC) N
        WHERE S.Value <= LEN(CL)
    )
    WHERE CL LIKE '%?%'
    

    See this db<>fiddle.

    For 2016, we need to fall back to the old FOR XML PATH('') technique.

    ...
    SET NewCL = (
        SELECT (
            SELECT N.NewC as [text()]
            FROM Series S
            CROSS APPLY(SELECT SUBSTRING(CL, S.value, 1) AS C) O
            CROSS APPLY(SELECT CASE WHEN O.C = '?' THEN SUBSTRING(IL, S.value, 1) ELSE O.C END AS NewC) N
            WHERE S.Value <= LEN(CL)
            ORDER BY S.value
            FOR XML PATH(''), TYPE
        ).value('text()[1]', 'nvarchar(max)')
    )
    ...
    

    Since we are not inserting separators, the STUFF() can be omitted from this use case. Normally, the expression being concatenated defaults to a text() element (plain text), but since it is a direct column reference, an explicit AS [text()] is needed to avoid the value being formatted as <NewC>x</NewC>. The use of , TYPE and .value('text()[1]', 'nvarchar(max)') is best practice to avoid character encoding issues.

    See this 2016 fiddle.