Search code examples
windowsctagsexuberant-ctags

Exuberant Ctags is not excluding files properly on Windows


When executing Ctags like so

ctags -V -R --exclude=*.js

it is not properly excluding the *.js files, as you can see by the log

Reading initial options from command line
  Option: --exclude=*.js
adding exclude pattern: *.js
Reading command line arguments
OPENING app.js as JavaScript language file
sorting tag file

Here is the Ctags version:

Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Jul  9 2009, 17:05:35
  Addresses: <[email protected]>, http://ctags.sourceforge.net
  Optional compiled features: +win32, +regex, +internal-sort

I've tried surrounding the pattern with double quotes and single quotes, it still doesn't work.

How to exclude *.js files on parsing files in a directory tree by Ctags on Windows?


Solution

  • The Exuberants Ctags Manual describes for option --exclude:

    ... If appropriate support is available from the runtime library of your C compiler, then pattern may contain the usual shell wildcards (not regular expressions) common on Unix (...). You can determine if shell wildcards are available on your platform by examining the output of the --version option, which will include "+wildcards" in the compiled feature list; otherwise, pattern is matched against file names using a simple textual comparison.

    Next take a look on the last line output by ctags.exe on running ctags.exe --version:

    Optional compiled features: +win32, +regex, +internal-sort

    There is no +wildcards. This means a wildcard pattern like *.js is not supported by used ctags.exe on Windows according to manual.

    Before offering multiple solutions, let us look on starting the options list with -R -V versus -V -R.

    Ctags outputs with ctags -V -R first the complete internal initialization with listing

    • which parsers are installed,
    • which files are interpreted as header files,
    • which language mappings are setup,
    • which default exclude patterns are used and
    • which directories and files are searched for loading default options.

    But on starting the options list with -R -V the internal initialization is not output.

    So when not being interested in internal initialization, specify -V not as first option on command line.

    To understand the commands used and how they work, open a command prompt window, execute there the following commands, and read the displayed help pages for each command, entirely and carefully.

    • del /?
    • dir /?
    • echo /?
    • endlocal /?
    • for /?
    • if /?
    • set /?
    • setlocal /?

    Read the Microsoft documentation about Using command redirection operators for an explanation of >> and 2>nul. The redirection operator > must be escaped with caret character ^ on FOR command line to be interpreted as literal character when Windows command interpreter processes this command line before executing command FOR which executes the embedded dir command line with using a separate command process started in background with %ComSpec% /c and the DIR command line appended as additional arguments.

    Solution 1: Exclude language JavaScript completely

    One solution for excluding *.js files is excluding language JavaScript completely using the command line:

    ctags.exe -R -V --languages=-JavaScript
    

    Solution 2: Specify on command line the name of each *.js file to exclude

    Another solution is using a batch file which adds the name of each *.js file found in current directory tree with option --exclude to the command line:

    @echo off
    setlocal EnableExtensions EnableDelayedExpansion
    set "ExcludeOptions="
    
    for /R %%I in (*.js) do (
        if not defined JS_%%~nxI (
            set "JS_%%~nxI=1"
            set "ExcludeOptions=!ExcludeOptions! "--exclude=%%~nxI""
        )
    )
    
    ctags.exe -R -V%ExcludeOptions%
    endlocal
    

    It is enough to specify only file name with file extension without (relative) path as all *.js files should be ignored on creating the tags file independent in which directory the *.js file exists.

    The code is written to avoid duplicates on command line to keep the command line as short as possible.

    Command FOR ignores *.js files with hidden file attribute set as well as directories with hidden attribute set. But Ctags does not ignore files and folders with hidden attribute set. The following code using command DIR could be used to add also an exclude option for hidden *.js files and *.js files in hidden folders.

    @echo off
    setlocal EnableExtensions EnableDelayedExpansion
    set "ExcludeOptions="
    
    for /F "delims=" %%I in ('dir /A-D /B /S *.js 2^>nul') do (
        if not defined JS_%%~nxI (
            set "JS_%%~nxI=1"
            set "ExcludeOptions=!ExcludeOptions! "--exclude=%%~nxI""
        )
    )
    
    ctags.exe -R -V%ExcludeOptions%
    endlocal
    

    The minor disadvantage of this solution:
    A *.js file with an exclamation mark in file name is not excluded because a single exclamation mark is removed from file name after expanding %%~nxI respectively the string between two exclamation marks is removed completely or replaced by the value of a matching environment variable because of enabled delayed expansion.

    Solution 3: Use a temporarily created list file with the names of all *.js files to exclude

    On many *.js files it is perhaps better to write their file names into a temporary list file from which Ctags reads the names of the files to exclude.

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    set "ExcludeListFile=%TEMP%\ExcludeList.tmp"
    del "%ExcludeListFile%" 2>nul
    
    for /R %%I in (*.js) do (
        if not defined JS_%%~nxI (
            set "JS_%%~nxI=1"
            echo %%~nxI>>"%ExcludeListFile%"
        )
    )
    
    set "ExcludeOption="
    if exist "%ExcludeListFile%" set "ExcludeOption= "--exclude=@%ExcludeListFile%""
    
    ctags.exe -R -V%ExcludeOption%
    
    del "%ExcludeListFile%" 2>nul
    endlocal
    

    This solution does not require delayed expansion and therefore works also for *.js files containing an exclamation mark in file name.

    Again nearly the same code as above for not ignoring *.js files with hidden file attribute set or being in a hidden folder:

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    set "ExcludeListFile=%TEMP%\ExcludeList.tmp"
    del "%ExcludeListFile%" 2>nul
    
    for /F "delims=" %%I in ('dir /A-D /B /S *.js 2^>nul') do (
        if not defined JS_%%~nxI (
            set "JS_%%~nxI=1"
            echo %%~nxI>>"%ExcludeListFile%"
        )
    )
    
    set "ExcludeOption="
    if exist "%ExcludeListFile%" set "ExcludeOption= "--exclude=@%ExcludeListFile%""
    
    ctags.exe -R -V%ExcludeOption%
    
    del "%ExcludeListFile%" 2>nul
    endlocal
    

    Solution 4: Use a temporarily created list file with all files to parse

    It should be enough to specify on command line the files to parse using a wildcard pattern as this is supported by Ctags compiled for Windows.

    ctags.exe -R -V *.htm
    

    This command line should result in parsing all *.htm and *.html files in entire directory tree of current directory. The *.html are also matched by this wildcard pattern because the short 8.3 name has for *.html files the file extension HTM. The Windows kernel function used by default on searching for files matching a wildcard pattern applies the pattern always on long and short file name to determine if the pattern matches the file name.

    It is possible to specify several file extensions on the command line and not just one as for C/C++:

    ctags.exe -R -V *.c *.cpp *.h
    

    But the problem is that Ctags version 5.8 does not search recursive for files matching this wildcard pattern although option -R is specified. It looks like the startup code added by the compiler used on creating ctags.exe searches already for files matching the wildcard pattern and therefore main function of Ctags gets an argument list with argument *.htm being already replaced by multiple arguments each containing a file name matching the wildcard pattern in current folder.

    On Unix/Linux a wildcard pattern like *.htm *.html not being enclosed in quotes would result in replacing these two patterns by all file names in current directory matching those 2 patterns by the shell (sh, bash, ksh, ...) before calling Ctags executable.

    In other words specifying the file types to parse on command line does not work recursive and is therefore no solution here as -R means clearly a recursive parsing of files is wanted.

    But working is creating temporary a list file with the file names with file extension and full path of all files to parse and specify the file name of this list file on command line.

    First with ignoring files and directories with hidden attribute set because of using for /R:

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    set "IncludeListFile=%TEMP%\IncludeList.tmp"
    del "%IncludeListFile%" 2>nul
    
    for /R %%I in (*.htm) do echo %%I>>"%IncludeListFile%"
    
    if exist "%IncludeListFile%" (
        ctags.exe -L "%IncludeListFile%" -V
        del "%IncludeListFile%" 2>nul
    )
    endlocal
    

    Second with including also hidden files and files in hidden directories because of using dir /A-D:

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    set "IncludeListFile=%TEMP%\IncludeList.tmp"
    del "%IncludeListFile%" 2>nul
    
    for /F "delims=" %%I in ('dir /A-D /B /ON /S *.htm 2^>nul') do echo %%I>>"%IncludeListFile%"
    
    if exist "%IncludeListFile%" (
        ctags.exe -L "%IncludeListFile%" -V
        del "%IncludeListFile%" 2>nul
    )
    endlocal
    

    In both batch codes the single wildcard pattern *.htm can be replaced by a space separated list of wildcard patterns with ? and * like *.c *.cpp *.h or even more complex patterns.

    The DIR option /ON results in sorting the file names per folder by name by command DIR which is not necessary on NTFS drives as the New Technology File System returns the list of file names always sorted by name to calling Windows kernel function, but FAT (File Allocation Table) drives (FAT16, FAT32, exFAT) do not. That is not really important here, but it is easier for viewing the processed files when being sorted by name independent on file system of current drive.

    This solution with a temporary list file containing the file names of the files to parse is the method used by text editors and IDEs with built-in support for Ctags.