Search code examples
windowsfor-loopbatch-fileescapingspecial-characters

How to pass a command that may contain special characters (such as % or !) inside a variable to a for /f loop?


I have a few nested loops in my code and in some point, they're divided by a call to a label like this:

@echo off
chcp 65001
for /r %%a in (*.mkv *.mp4 *.avi *.mov) do (
    echo Processing "%%~a"
    call :innerloop "%%a" "%%~fa"
)
:: Instead of goto :eof, I chose cmd /k because I want the command prompt to still be open after the script is done, not sure if this is correct though
cmd /k

:innerloop
setlocal EnableExtensions EnableDelayedExpansion
for /f "delims=" %%l in ('mkvmerge.exe -i "%~1"') do (
:: Probably this would be a safer place for setlocal, but I believe that would mean that I wouldn't get to keep a single, different !propeditcmd! per processed file
    echo Processing line "%%~l"
    for /f "tokens=1,4 delims=: " %%t in ("%%l") do (
:: This section checks for mkv attachments. There are similar checks for chapters and global tags, all of those are handled by mkvpropedit.exe
        if /i "%%t" == "Attachment" (
            if not defined attachments (
                set /a "attachments=1"
            ) else (
                set /a "attachments+=1"
            )
            if not defined propeditcmd (
                set "propeditcmd= --delete-attachment !attachments!"
            ) else (
                set "propeditcmd=!propeditcmd! --delete-attachment !attachments!"
            )
        )
    )
)
:: Since !propeditcmd! (which contains the parameters to be used with the executable) is called after all lines are processed, I figured setlocal must be before the first loop in this label
if defined propeditcmd (
    mkvpropedit.exe "%~f1" !propeditcmd!
)
endlocal
goto :eof

The script works for most files and is divided like that to allow breaking the inner loop without breaking the outer when a pass is reached. While it works for most files, I noticed it can't handle filenames containing parenthesis % in their names, likely due to EnableDelayedExtensions.

Normally, I know I would have to escape these characters with a caret (^), but I don't know how I can do it if the special characters are inside a variable (%~1).

Is there a way to do it?

Update: I've been working a way to separate the section that needs delayed expansion from the one that needs it off just find in the end of my code the line mkvpropedit.exe "%~f1" !propeditcmd!, which both needs it off and on due to "%~f1" and !propeditcmd! respectively. I think this means there's no way around the question and escaping will be necessary.

Continuing my research, this answer seem to suggest this could be achieved with something like set filename="%~1:!=^^!". Nevertheless, this doesn't seem to be the proper syntax according to SS64. I'm also unsure if this will replace all occurrences of ! with ^! and I'm also concerned this kind of substitution could create an infinite loop and if wouldn't it be more adequate to perform this by first replacing ! with, say, ¬ before replacing it ^!.

While I intend to do testing soon to determine all of this, I'm worried I may not cover it all, so more input would definitely be appreciated.


PS: full code (88 lines) is available here if more context is needed, although I'll edit the snippet in this question as it may be requested!

Edit: I didn't think it was relevant at first, but now I think it helps to know what is an standard output from mkvmerge.exe -i:

File 'test.mkv': container: Matroska
Track ID 0: video (AVC/H.264/MPEG-4p10)
Track ID 1: audio (Opus)
Track ID 2: subtitles (SubRip/SRT)
Attachment ID 1: type 'image/jpeg', size 30184 bytes, file name 'test.jpg'
Attachment ID 2: type 'image/jpeg', size 30184 bytes, file name 'test2.jpg'
Attachment ID 3: type 'image/jpeg', size 30184 bytes, file name 'test3.jpg'
Chapters: 5 entries
Global tags: 3 entries

Solution

  • There is not really a need for a subroutine. Delayed variable expansion is needed finally, but it is possible to first assign the fully qualified file name to an environment variable like FileName to avoid troubles with file names containing an exclamation mark.

    The rewritten code according to the code posted in the question with some comments:

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    set "WindowTitle=%~n0"
    
    rem Find out if the batch file was started with a double click which means
    rem with starting cmd.exe with option /C and the batch file name appended
    rem as argument. In this case start one more Windows command processor
    rem with the option /K and the batch file name to keep the Windows command
    rem processor running after finishing the processing of this batch file
    rem and exit the current command processor processing this batch file.
    rem This code does nothing if the batch file is executed from within a
    rem command prompt window or it was restarted with the two options /D /K.
    
    setlocal EnableDelayedExpansion
    for /F "tokens=1,2" %%G in ("!CMDCMDLINE!") do (
        if /I "%%~nG" == "cmd" if /I "%%~H" == "/c" (
            endlocal
            start %SystemRoot%\System32\cmd.exe /D /K %0
            if not errorlevel 1 exit /B
            setlocal EnableDelayedExpansion
        )
    )
    
    rem Set the console window title to the batch file name.
    title !WindowTitle!
    endlocal
    set "WindowTitle="
    
    rem Get the number of the current code page and change the code page
    rem to 65001 (UTF-8). The initial code page is restored at end.
    for /F "tokens=*" %%G in ('%SystemRoot%\System32\chcp.com') do for %%H in (%%G) do set "CodePage=%%~nH"
    %SystemRoot%\System32\chcp.com 65001 >nul 2>&1
    
    for /R %%G in (*.mkv *.mp4 *.avi *.mov) do (
        echo(
        echo Processing "%%~G"
        set "Attachments="
        for /F "delims=" %%L in ('mkvmerge.exe -i "%%G"') do (
            rem echo Processing line "%%L"
            for /F "delims=: " %%I in ("%%L") do if /I "%%I" == "Attachment" set /A Attachments+=1
        )
        if defined Attachments (
            set "FileName=%%G"
            setlocal EnableDelayedExpansion
            set "propeditcmd=--delete-attachment 1"
            for /L %%I in (2,1,!Attachments!) do set "propeditcmd=!propeditcmd! --delete-attachment %%I"
            mkvpropedit.exe "!FileName!" !propeditcmd!
            endlocal
        )
    )
    
    rem Restore the initial code page.
    %SystemRoot%\System32\chcp.com %CodePage% >nul
    endlocal
    

    Why is the window title passed to TITLE with using delayed expansion?

    An argument string must be enclosed in " if it contains after the expansion of dynamic variable, environment variable, loop variable or batch file argument references a space or one of these characters &()[]{}^=;!'+,`~<>| if all these characters should be interpreted literally by the Windows command processor cmd.exe. For that reason the third line encloses the argument string WindowTitle=%~n0 in double quotes because of %~n0 references the batch file name without file extension and without path which could contain, for example, an ampersand although that would be a very usual file name for a batch file.

    See also: How does the Windows Command Interpreter (CMD.EXE) parse scripts?

    The command TITLE is like the command ECHO regarding to ". It always interprets double quotes as literal characters and do not remove them from the argument string. So the usage of title "%WindowTitle%" would result in having a title for the console window which starts and ends with a double quote. That would not look nice. Therefore the batch file name as window title should be passed to the cmd.exe internal command TITLE without double quotes. But that is problematic in case of the batch file name contains a character with a special meaning for cmd.exe processing the command line before executing the command TITLE like &. For that reason delayed variable expansion is enabled and used here to reference the batch file name assigned to the environment variable WindowTitle which makes it possible to get the window title really set according to the batch file name.

    Why is current code page determined and restored at end?

    A good written batch file for usage by many people changing something on execution environment should always restore the initial execution environment, except the batch file is explicitly designed to define the execution environment for applications and scripts executed after batch file execution finished.

    What does that mean for batch file development?

    The following properties of the execution environment should be unmodified after finishing the execution of a batch file in comparison to the property values on starting the batch file:

    1. the list of environment variables and their values;
    2. the status of command extensions;
    3. the status of delayed expansion;
    4. the current directory;
    5. the command prompt;
    6. text and background color;
    7. the code page to use for character encoding;
    8. the number of rows and columns of the console window.

    The first four properties of the execution environment are unmodified on using at top of the batch file SETLOCAL and optionally at bottom also ENDLOCAL. An explicit ENDLOCAL at bottom of a batch file is optional because of cmd.exe calls it implicit for each SETLOCAL without an executed matching ENDLOCAL before exiting the processing of a batch file independent on the cause of exiting the batch file processing.

    See also: How to pass environment variables as parameters by reference to another batch file?
    It explains in full details what happens on each execution of SETLOCAL and ENDLOCAL.

    For each successfully executed PUSHD should be executed also a POPD to restore the initial current directory.

    The command prompt needs to be restored only on changing it with command PROMPT which most batch files don't do at all.

    The usage of CHCP to change the code page should result in using CHCP once again at end of a batch file to restore the original code page. The same should be done on using the command COLOR to change the text color and the background color and command MODE to change the rows and the columns of the console window.

    See DosTips forum topic [Info] Saving current codepage, especially the post written by Compo, for an explanation about getting current code page number assigned to an environment variable which is used at end of the batch file to restore the initial code page.

    It is a bit difficult to understand why getting the current code page number is done with two FOR loops whereby the second one uses the modifier %~n although the output of chcp.com is definitely not a file name. So let us look on what happens on a German Windows on which the command CHCP outputs the string:

    Aktive Codepage: 850.
    

    The dot at end of the output is not wanted, just the code page number like on English Windows on which the output is:

    Active code page: 850
    

    See the referenced DosTips topic for other variants depending on the language of Windows.

    The output of chcp.com is first assigned completely to the loop variable G with removing leading normal spaces and horizontal tabs if chcp.com would output the code page information with leading spaces/tabs. The second FOR loop processes this list of words with using normal space, comma, semicolon, equal sign and OEM encoded no-break space as word delimiters.

    The second FOR loop runs the command SET for German code page information three times with the strings:

    1. Aktive
    2. Codepage:
    3. 850.

    The usage of the modifier %~n results now three times in accessing the file system by cmd.exe and searching in current directory for a file with the string assigned to the loop variable H as file name. There is most likely no file Aktive. Codepage: with the colon at end is an invalid file name, and a file 850 with trailing dot removed by the Windows file IO API functions is most likely also not found in current directory. However, it does not really matter if there is by chance a file system entry matching one of the three strings or not because of %~n results in using just the string from beginning to the character before the last dot. So the command SET is first executed with Aktive, a second time with Codepage: and finally a third time with 850. So the environment variable CodePage is defined finally with just the number 850.

    Description of the main FOR loops processing the video files

    The most outer FOR assigns the name of the found file always with full path without surrounding " to the specified loop variable G because of using option /R. For that reason just "%%G" is used instead of "%%~G" wherever the fully qualified file name must be referenced to speed up the processing of the file names.

    echo( outputs an empty line, see the DosTips forum topic ECHO. FAILS to give text or blank line - Instead use ECHO/

    If an undefined environment variable like Attachments is referenced in an arithmetic expression evaluated by SET, the value 0 is used as explained by the usage help output on running set /? in a command prompt window. For that reason set /A Attachments+=1 can be used to either define the variable with 1 on first execution or increment the value of environment variable Attachments by one on all further executions for the current file.

    The final value of environment variable Attachments is evaluated after processing all lines output by mkvmerge. If there are attachments, the file name is assigned to the environment variable FileName with still disabled delayed variable expansion and for that reason ! is interpreted as literal character. The environment variable propeditcmd is created next dynamically according to the number of attachments.

    Optimized code for the entire video files processing task

    I have installed neither mkvmerge.exe nor mkvpropedit, but I looked also on the referenced full code. Here is a rewritten optimized version of your full code without any comment which I could not really test completely.

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    set "WindowTitle=%~n0"
    setlocal EnableDelayedExpansion
    for /F "tokens=1,2" %%G in ("!CMDCMDLINE!") do (
        if /I "%%~nG" == "cmd" if /I "%%~H" == "/c" (
            endlocal
            start %SystemRoot%\System32\cmd.exe /D /K %0
            if not errorlevel 1 exit /B
            setlocal EnableDelayedExpansion
        )
    )
    title !WindowTitle!
    endlocal
    
    for /F delims^=^=^ eol^= %%G in ('set ^| %SystemRoot%\System32\findstr.exe /B /I /L /V "ComSpec= PATH= PATHEXT= SystemRoot= TEMP= TMP="') do set "%%G="
    
    if exist "%~dp0mkvmerge.exe" (set "ToolsPath=%~dp0") else if exist mkvmerge.exe (set "ToolsPath=%CD%") else for %%I in (mkvmerge.exe) do set "ToolsPath=%%~dp$PATH:I"
    if not defined ToolsPath echo ERROR: Could not find mkvmerge.exe!& exit /B 2
    if "%ToolsPath:~-1%" == "\" set "ToolsPath=%ToolsPath:~0,-1%"
    if not exist "%ToolsPath%\mkvpropedit.exe" echo ERROR: Could not find mkvpropedit.exe!& exit /B 2
    
    for /F "tokens=*" %%G in ('%SystemRoot%\System32\chcp.com') do for %%H in (%%G) do set /A "CodePage=%%H" 2>nul
    %SystemRoot%\System32\chcp.com 65001 >nul 2>&1
    
    del /A /F /Q Errors.txt ExtraTracksList.txt 2>nul
    
    (
    set "ToolsPath="
    set "CodePage="
    
    for /F "delims=" %%G in ('dir *.mkv /A-D-H /B /S 2^>nul') do (
        echo --^> Processing file "%%G" ...
        setlocal
        set "FullFileName=%%G"
        for /F "tokens=1,4 delims=: " %%H in ('^""%ToolsPath%\mkvmerge.exe" -i "%%G" --ui-language en^"') do (
            if /I "%%I" == "audio" (
                set /A AudioTracks+=1
                setlocal EnableDelayedExpansion
                if !AudioTracks! == 2 echo !FullFileName!>>ExtraTracksList.txt
                endlocal
            ) else if not defined SkipFile if /I "%%I" == "subtitles" (
                echo --^> "%%~nxG" has subtitles
                "%ToolsPath%\mkvmerge.exe" -o "%%~dpnG.nosubs%%~xG" -S -M -T -B --no-global-tags --no-chapters --ui-language en "%%G"
                if not errorlevel 1 (
                    echo --^> Deleting old file ...
                    del /F "%%G"
                    echo --^> Renaming new file ...
                    ren "%%~dpnG.nosubs%%~xG" "%%~nxG"
                ) else (
                    echo Warnings/errors generated during remuxing, original file not deleted, check Errors.txt
                    "%ToolsPath%\mkvmerge.exe" -i --ui-language en "%%G">>Errors.txt
                    del "%%~dpnG.nosubs%%~xG" 2>nul
                )
                set "SkipFile=1"
            ) else if /I "%%H" == "Attachment"  (
                set /A Attachments+=1
            ) else if /I "%%H" == "Global" (
                set "TagsAll=--tags all:"
            ) else if /I "%%H" == "Chapters" (
                set "Chapters=--chapters """
            )
        )
        if not defined SkipFile (
            set "OnlyFileName=%%~nxG"
            setlocal EnableDelayedExpansion
            if defined Attachments (
                set "PropEditOptions= --delete-attachment 1"
                for /L %%H in (2,1,!Attachments!) do set "PropEditOptions=!PropEditOptions! --delete-attachment %%H"
            )
            if defined TagsAll set "PropEditOptions=!PropEditOptions! !TagsAll!"
            if defined Chapters set "PropEditOptions=!PropEditOptions! !Chapters!"
            if defined PropEditOptions (
                echo --^> "!OnlyFileName!" has extras ...
                "%ToolsPath%\mkvpropedit.exe" "!FullFileName!"!PropEditOptions!
            )
            endlocal
        )
        echo(
        echo ##########
        echo(
        endlocal
    )
    for /F "delims=" %%G in ('dir *.avi *.mp4 *.mov /A-D-H /B /S 2^>nul') do (
        echo Processing file "%%G" ...
        "%ToolsPath%\mkvmerge.exe" -o "%%~dpnG.mkv" -S -M -T -B --no-global-tags --no-chapters --ui-language en "%%G"
        if not errorlevel 1 (
            echo --^> Deleting old file ...
            del /F "%%G"
        ) else (
            echo --^> Warnings/errors generated during remuxing, original file not deleted.
            "%ToolsPath%\mkvmerge.exe" -i --ui-language en "%%G">>Errors.txt
            del "%%~dpnG.mkv" 2>nul
        )
        echo(
        echo ##########
        echo(
    )
    
    if exist Errors.txt for %%G in (Errors.txt) do if %%~zG == 0 del Errors.txt 2>nul
    %SystemRoot%\System32\chcp.com %CodePage% >nul
    )
    endlocal
    

    Removal of not needed environment variables in local environment

    The batch file has to process perhaps hundreds or even thousands of files using multiple environment variables.

    There is at least once per MKV file used SETLOCAL and ENDLOCAL creating a copy of current environment variables list which is discarded after finishing processing of the current MKV file.

    There are also other programs executed for each video files on which the Windows kernel library function CreateProcess creates also a copy of the current list of environment variables of current process.

    For that reason it is helpful to use a local environment variables list which contains only the environment variables really needed during processing of the video files.

    The first FOR after setting the window title runs in background one more cmd.exe as follows:

    C:\Windows\System32\cmd.exe /c set | C:\Windows\System32\findstr.exe /B /I /L /V "ComSpec= PATH= PATHEXT= SystemRoot= TEMP= TMP="
    

    There is output by set of started cmd.exe in background the same list of environment variables with their values as the command process currently uses which processes the batch file. The lines are passed to findstr which searches case-insensitive (/I) and literally (/L) for the space separated strings at beginning of each line (/B) and outputs the inverted result (/V) which means all lines NOT beginning with one of the space separated strings. So there are output all the environment variables separated with a = from their values, except those searched for and found by findstr.

    The captured lines are processed by FOR with using the equal sign as string delimiter and no character as end of line character to process even an environment variable of which name starts with a semicolon and assigns to the loop variable G just the variable name which is used to remove the variable from the current environment variables list.

    So there are only remaining the environment variables ComSpec, PATH, PATHEXT, SystemRoot, TEMP and TMP.

    Use fully qualified file names to avoid unnecessary file system accesses

    Most people use in batch files just the file names of executables without file extension and without file path which forces cmd.exe to search in current directory and next in all directories as specified in environment variable PATH for the file with a file extension as specified in environment variable PATHEXT. That results in thousands of file system accesses on processing hundreds of files in a loop calling executables on each file.

    All these file system accesses can be avoided by specifying each executable with its fully qualified file name in the batch file. That does not mean that a batch file must contain already the fully qualified file name for each executable as the code above demonstrates because of the full file names of the executables can be determined also once at beginning of the batch file.

    The batch file first checks if mkvmerge.exe is in the directory of the batch file and defines the environment variable ToolsPath with the full batch file path if that file check is positive. Otherwise there is searched in the current directory for the executable mkvmerge.exe and the current directory path is assigned to ToolsPath if there is a file system entry (hopefully a file and not a directory) with the name mkvmerge.exe. Last there is searched for mkvmerge.exe in the directories of environment variable PATH and if found this directory path is assigned to ToolsPath.

    The batch file outputs an error message, restores the initial environment and exits on executable mkvmerge.exe or the other one mkvpropedit.exe could not be found at all.

    %~dp0 and %%~dp$PATH:I expand to a path string always ending with a backslash. %CD% expands to a path string not ending with a backslash, except the current directory is the root directory of a drive. For that reason an IF condition with a string comparison is used to check if the path string assigned to ToolsPath ends with a backslash in which case the environment variable is redefined with this backslash removed. The backslash is added in the code below on referencing the path string of ToolsPath.

    Determination of current code page using a different method

    This time the first solution developed by Compo is used to determine the number of the current code page. It is similar to the other solution as using the same two FOR loops, but the command SET executed by the second FOR loop evaluates now an arithmetic expression to get on last iteration the code page number without the dot assigned to the environment variable CodePage.

    Let us look again what happens on processing the string: Aktive Codepage: 850.

    There is first executed set /A "CodePage=Aktive" which results in environment variable CodePage is defined with value 0 because of Aktive is interpreted as environment variable name and there is no such environment variable. Next is executed set /A "CodePage=Codepage:" with the same interpretation and the same result 0. And last is executed set /A "CodePage=850." which results in the error message Missing operator. to handle STDERR redirected to the device NUL to suppress it. However, the value assigned to the environment variable CodePage is 850 as wanted.

    The advantage of this solution is the usage of %%H inside the arithmetic expression which does not result in any file system access. So this solution is in general better in my opinion.

    How to avoid batch file accesses during processing the video files?

    I recommend reading Why is a GOTO loop much slower than a FOR loop and depends additionally on power supply?

    Conclusion: It is a good idea to put the entire code required to process hundreds or thousands of files into one command block which the Windows command processors reads and parses just once.

    The problem is in most cases how to handle variables of which values changes within the command block without using all the time delayed expansion as that affects processing of strings like file names. That is in most cases not easy, but it is often possible as it can be seen on the code above.

    The environment variables ToolsPath and CodePage can be undefined immediately at beginning of the main code block because of the command processor replaced already all %ToolsPath% and %CodePage% by the appropriate path and code page number strings before executing the first command set "ToolsPath=". So the current environment variables list on execution of the first main FOR loop contains just the five environment variables found by findstr.

    The Windows command processor does not access anymore the batch file until having finished processing all video files and restored the original code page.

    Other special information about the code in second batch file

    The two text files with information collected during processing of the video files are always deleted first using the command DEL if the file system does not prevent the deletion of the files.

    There is used twice for /F instead of for /R as main FOR loops to get first all file names of video files to process with full path loaded into memory of the Windows command processor and then process the video files instead of iterating over the current file system entries as done by for /R. This makes a big difference for the loop processing *.mkv files, especially on video files being stored on a FAT32 or exFAT formatted drive on which the file allocation table does not only change on processing an MKV file as also on NTFS formatted drives, but are not updated in file allocation table in a local alphabetic sort as on an NTFS formatted drive. The usage of for /R could result on a FAT32 or exFAT formatted drive in either processing an MKV file more than once or skipping unexpected one or more MKV files due to the file allocation table changes caused by the execution of mkvmerge or mkvpropedit on an MKV file.

    The commands SETLOCAL and ENDLOCAL are used to quickly restore always the minimal environment variables list defined outside of the main FOR loops for each MKV file which results in discarding always all the changes made on the environment variables list on processing an MKV file.

    The execution of mkvmerge.exe with its full path with option -i and the full name of current MKV file by one more cmd.exe started with /c and the specified command line is a bit tricky on taking into account that %ToolsPath% and %%G could contain also characters like & to be interpreted as literal characters by cmd.exe processing the batch file and also by cmd.exe started in background.

    It is necessary to enclose the entire command line to execute by cmd.exe in background in double quotes to be correct processed by this cmd.exe instance. But the cmd.exe instance processing the batch file must interpret these two " as literal characters and not as beginning or end of an argument string. Otherwise "" at beginning would be interpreted by cmd.exe processing the batch file as the beginning and the end of an empty argument string. Therefore the tools path string would be not anymore enclosed in double quotes for cmd.exe processing the batch file which of course is problematic on containing & or ' or ).

    For that reason the two double quotes to enclose the entire command line in " are specified in the batch file with the caret character ^ to be escaped which results in cmd.exe processing the batch file is interpreting these two double quotes as literal characters and not as beginning/end of an argument string.

    The result is that "%ToolsPath%\mkvmerge.exe" and "%%G" are interpreted by both cmd.exe as double quoted argument strings and therefore can contain all characters interpreted as literal characters which would otherwise be interpreted with a special meaning.

    The information about audio tracks are processed always independent in which order mkvmerge.exe outputs the information data about the current MKV file. But all other information are not further processed once the environment variable SkipFile is defined because of the current MKV file has subtitles.

    The file Errors.txt is deleted on being created, but has finally a size of 0 bytes.

    Usage help for the used Windows commands

    To understand the commands used and how they work, open a command prompt window, execute there the following commands, and read the displayed help pages for each command, entirely and carefully.

    • call /?
    • chcp /?
    • cmd /?
    • dir /?
    • del /?
    • echo /?
    • endlocal /?
    • exit /?
    • findstr /?
    • for /?
    • if /?
    • rem /?
    • set /?
    • setlocal /?
    • start /?
    • title /?

    See also Issue 7: Usage of letters ADFNPSTXZadfnpstxz as loop variable and the other chapters about general issues made by beginners in batch file coding.