I have a table with two string type columns named IL and CL. I have to compare both the strings character by character and wherever there is a question mark in column CL string, I have to replace it with the character found in the other column (IL), in the exact same position.
For example, the CL string after comparing with IL and replacing the question marks would be the New CL as shown below
CL | IL | NEW CL |
---|---|---|
???-123201-000000-000-??? | 104-234561-644221-123-947 | 104-123201-000000-000-947 |
I have the below code working, where there is a while
loop inside of a cursor
going through each character of the string, concatenating into a new string, and then doing an update at the end with the new value. However, performance with this piece of code is horribly slow as the table has 100K+ records and it's looping through each row 25 time (string length). I am looking to see if perhaps there is a way to rewrite this logic set based for performance improvement.
declare CharC insensitive cursor for
select ID, IL, CL
from AcctStrings
where CL like '%?%'
open CharC
fetch next from CharC into @ID, @IL, @CL
while @@fetch_status = 0 begin
set @NewCL = ''
set @i = 1
while @i <= 25 begin
set @TestChar = substring(@CL,@i,1)
set @OtherChar = substring(@IL,@i,1)
if (@TestChar = '?') begin
set @NewCL = @NewCL + @OtherChar
end
else
set @NewCL = @NewCL + @TestChar
set @i = @i + 1
end
update AcctStrings
set CL = @NewCL
where ID = @ID
end
fetch next from CharC into @ID, @IL, @CL
end
deallocate CharC
If using SQL Server 2022, you can try the following:
UPDATE AcctStrings
SET CL = (
SELECT STRING_AGG(N.NewC, '') WITHIN GROUP(ORDER BY S.Value)
FROM GENERATE_SERIES(1, LEN(CL)) S
CROSS APPLY(SELECT SUBSTRING(CL, S.value, 1) AS C) O
CROSS APPLY(SELECT CASE WHEN O.C = '?' THEN SUBSTRING(IL, S.value, 1) ELSE O.C END AS NewC) N
)
WHERE CL LIKE '%?%'
The above uses a subquery to break the CL
string down into individual characters, apply the transform (when needed), and then reconstruct the updated string.
See this db<>fiddle for a demo.
Results (with some additional test data):
CL | IL | NewCL |
---|---|---|
???-123201-000000-000-??? |
104-234561-644221-123-947 |
104-123201-000000-000-947 |
111-222222-333333-444-555 |
xxx-xxxxxx-xxxxxx-xxx-xxx |
null |
????????????????????????? |
123-654321-123456-456-789 |
123-654321-123456-456-789 |
The null value indicates that no NewCL
value was calculated, because the original value contained no "?"s.
Addendum:
For SQL Server 2019, the GENERATE_SERIES()
function is unavailable, but there are several alterative techniques available for generating a number sequence - A VALUES
subselect, ROW_NUMBER()
applied to a sufficiently large row source, or a recursive CTE (Common Table Expression).
The following uses a recursive CTE to generate a number sequence, that is then limited to the CL string length in the WHERE
clause. The remaining logic is the same.
;WITH Series AS (
SELECT 1 AS Value
UNION ALL
SELECT S.Value + 1 AS Value
FROM Series S
WHERE S.Value < 99 -- At least the maximum expected length of CL
)
UPDATE AcctStrings
SET CL = (
SELECT STRING_AGG(N.NewC, '') WITHIN GROUP(ORDER BY S.Value)
FROM Series S
CROSS APPLY(SELECT SUBSTRING(CL, S.value, 1) AS C) O
CROSS APPLY(SELECT CASE WHEN O.C = '?' THEN SUBSTRING(IL, S.value, 1) ELSE O.C END AS NewC) N
WHERE S.Value <= LEN(CL)
)
WHERE CL LIKE '%?%'
See this db<>fiddle.
For 2016, we need to fall back to the old FOR XML PATH('')
technique.
...
SET NewCL = (
SELECT (
SELECT N.NewC as [text()]
FROM Series S
CROSS APPLY(SELECT SUBSTRING(CL, S.value, 1) AS C) O
CROSS APPLY(SELECT CASE WHEN O.C = '?' THEN SUBSTRING(IL, S.value, 1) ELSE O.C END AS NewC) N
WHERE S.Value <= LEN(CL)
ORDER BY S.value
FOR XML PATH(''), TYPE
).value('text()[1]', 'nvarchar(max)')
)
...
Since we are not inserting separators, the STUFF()
can be omitted from this use case. Normally, the expression being concatenated defaults to a text()
element (plain text), but since it is a direct column reference, an explicit AS [text()]
is needed to avoid the value being formatted as <NewC>x</NewC>
. The use of , TYPE
and .value('text()[1]', 'nvarchar(max)')
is best practice to avoid character encoding issues.
See this 2016 fiddle.