I found some T-SQL code, and i'm trying to figure out if it is:
We have a requirement to try to cleanup names:
Bob
→ Robert
Bill
→ William
Dick
→ Richard
So a table was created to hold the value and the value to replace it with. It holds the above words (as well as more esoteric, domain-specific, industry specific, mappings):
FillerWords:
Value | ReplaceWith |
---|---|
dick | Richard |
bill | William |
bob | Robert |
ted | William |
And the code I saw used to perform the replacment was either genius or madness:
DECLARE @firstName varchar(200) = 'bill';
SELECT @firstName = REPLACE(@firstName, FillerWords.Value, FillerWords.ReplaceWith) FROM FillerWords;
SELECT @firstName;
Which gives:
(no column name)
----------------
William
1 row(s) affected
Even more impressive is that it can do multiple replacements at once:
DECLARE @firstName varchar(200) = 'teddickbobbill';
SELECT @firstName = REPLACE(@firstName, FillerWords.Value, FillerWords.ReplaceWith) FROM FillerWords;
SELECT @firstName;
Which gives:
(no column name)
----------------
WilliamRichardRobertWilliam
1 row(s) affected
It works, but is it valid?
If you eliminate the variable assignment during the select:
SELECT REPLACE(@firstName, fw.Value, fw.ReplaceWith) FROM #FillerWords fw
you see that it is performing the REPLACE
on every row in the table:
(No column name)
teddickRobertbill
tedRichardbobbill
Williamdickbobbill
teddickbobWilliam
And so what happens is that each time a row is evaluated, the @firstName
variable is updated:
@firstName | Value | ReplaceWith | new value of @firstName |
---|---|---|---|
teddickbobbill | dick | Richard | tedRichard bobbill |
tedRichardbobbill | bill | William | tedRichardbobWilliam |
tedRichardbobWilliam | bob | Robert | tedRichardRobert William |
tedRichardRobertWilliam | ted | William | William RichardRobertWilliam |
So it seems to rely on a quirk of variable assignment, that you can keep revising the same variable row-after-row; it's almost like a cursor operation.
Is that style of T-SQL programming supported? I know recursive common-table-expressions (CTEs) do it all the time; but is it allowed here?
I know the code's success relies on the order in which rows are processed, which is generally not guaranteed in SQL. This could lead to inconsistent results, especially if the underlying data or the SQL Server's query execution plan changes. I do not care about this, here, in this question.
I asked one oratrice mechanique d'analyse cardinal, and it said:
The code exploits a specific behavior of T-SQL where a variable can be updated in a SELECT statement. However, this is not standard SQL behavior and may not be intuitive or maintainable. While this works in current versions of SQL Server, there's no guarantee that such behavior will remain consistent or supported in future versions, as it's not a documented feature for such purposes.
Is setting variables resursively not an accepted pattern?
This falls into the same bucket as the old string aggregation technique which sometimes worked and sometimes didn't (dependent on execution plan) and which has never been a supported technique.
The documentation explicitly warns against "Antipattern use of recursive variable assignment" archive
Your code:
SELECT @firstName = REPLACE(@firstName, ...)
FROM
Is certainly of the warned against structure:
SELECT @Var = <expression containing @Var>
FROM
Definitely not something that should be in production code that you care about as it is not guaranteed to work.
In the case the string concatenation approach "failed" then the resultant value of the variable just contained the result of a single assignment so I would expect there is some possibility of getting a plan that only effectively does a single REPLACE
too.