Search code examples
batch-filereplacecmdescapingdelayedvariableexpansion

Escaping exclamation marks required in replace string but not in search string (substring replacement with delayed expansion on)?


Supposing one wants to replace certain substrings by exclamation marks using the substring replacement syntax while delayed expansion is enabled, they have to use immediate (normal) expansion, because the parser cannot distinguish between !s for expansion and literal ones.

However, why does one have to escape exclamation marks in the replacement string? And why is it not necessary and even disruptive when exclamation marks in the search string are escaped?

The following script replaces !s in a string by ` and in reverse order afterwards, so I expect the result to be equal to the initial string (which must not contain any back-ticks on its own of course):

@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem This is the test string:
set "STRING=string!with!exclamation!marks!"
set "DELOFF=%STRING%"
set "DELOFF=%DELOFF:!=`%"
set "DELOFF=%DELOFF:`=!%"
setlocal EnableDelayedExpansion
set "DELEXP=!STRING!"
set "DELEXP=%DELEXP:!=`%"
set "DELEXP=%DELEXP:`=!%"
echo(original   string: !STRING!
echo(normal  expansion: !DELOFF!
echo(delayed expansion: !DELEXP!
endlocal
endlocal
exit /B

This result is definitely not what I want, the last string is different:

original   string: string!with!exclamation!marks!
normal  expansion: string!with!exclamation!marks!
delayed expansion: stringexclamation

As soon as take the line...:

set "DELEXP=%DELEXP:`=!%"

....and replace the ! by ^! there, hence escaping the exclamation mark in the replace string, the result is exactly what I expect:

original   string: string!with!exclamation!marks!
normal  expansion: string!with!exclamation!marks!
delayed expansion: string!with!exclamation!marks!

When I try other escaping combinations though (escape the exclamation mark in both the replace and the search string, or in the latter only), the result is again the aforementioned unwanted one.

I walked through the post How does the Windows Command Interpreter (CMD.EXE) parse scripts? but I could not find an explanation to that behaviour, because I learned the normal (or immediate, percent) expansion is accomplished long before delayed expansion occurs and any exclamation marks are even recognised. Also caret recognition and escaping seems to happen afterwards. In addition, there are even quotation marks around the strings that usualy hide carets from the parser.


Solution

  • Actually, for the substring replacement itself there is no escaping required. It becomes necessary for the later parsing phases only. This is why:

    However, why does one have to escape exclamation marks in the replacement string?

    The thing is, that immediate (normal, %) expansion is done in a quite early stage, whereas delayed expansion (!), as the name implies, is accomplished as one of the last steps. Hence a immediately expanded string also passes through the delayed expansion phase. As proof, set variables VAR to Value!X! and X to 0, then execute echo %VAR%, so you will get Value0 as the result.
    But back to the initial question, when using immediate substring replacement, the replacement string is part of the expanded value, so it is also passed through the delayed expansion phase. Therefore, a literal exclamation mark must be escaped in order not to be consumed by the delayed expansion. This implies that the escaping is not needed for the replacement itself, it is actually done afterwards, so the given replace string including the escaping is applied literally.

    And why is it not necessary and even disruptive when exclamation marks in the search string are escaped?

    Since caret recognition and so escaping happens after immediate expansion, the search string is treated literally. Furthermore, the search string is replaced and therefore not included in output of immediate substring replacement, so it is not passed through the delayed expansion phase.


    Let us look at the original example (excerpt only):

    set "STRING=string!with!exclamation!marks!"
    setlocal EnableDelayedExpansion
    set "DELEXP=!STRING!"
    set "DELEXP=%DELEXP:!=`%"
    set "DELEXP=%DELEXP:`=!%"
    echo(delayed expansion: !DELEXP!
    endlocal
    

    The replacement set "DELEXP=%DELEXP:!=`%" searches for !. The resulting value is string`with`exclamation`marks`.

    Using set "DELEXP=%DELEXP:^!=`%" would search for ^! literally, so no occurrences would be found of course (so all the literal ! in the original string were kept, they were processed by delayed expansion finally).

    The replacement set "DELEXP=%DELEXP:`=!%" replaces ` by ! perfectly, the result string is string!with!exclamation!marks!, but such are consumed by delayed expansion afterwards.

    The escaped replacement %DELEXP:`=^!% replaces ` by ^! literally, so the result is string^!with^!exclamation^!marks^!; the escaping is processed afterwards during the delayed expansion phase, resulting in literal ! and the return string string!with!exclamation!marks! finally.


    According to the post How does the Windows Command Interpreter (CMD.EXE) parse scripts?, there is a second phase where escaping occurs, which is the delayed expansion phase. This is the one that applies for the example in the original question, because the first escaping (during the special character recognition phase) is disabled due to the surrounding quotation marks (omitting such would lead to the need of double-escaping like ^^!).