Search code examples
batch-filefor-loopsubdirectory

Why is a subfolder with space recognized as file on execution of my batch script?


In my code I'm searching for only files in folder and all subfolders. When the name of subfolder has one blank (space) between the words this subfolder is recognized as a file, too. This is not correct behavior. The parameter /a-d doesn't help here.

@echo on
Setlocal EnableDelayedExpansion 

set "input=C:\Users\NekhayenkoO\test\"**
set "output=C:\Users\NekhayenkoO\outputxml\"**

set string1=Well-Formed and valid
set string2=Well-Formed, but not valid
set string3=Not well-formed
set /a loop=0
set /a loop1=0
set /a loop2=0
set /a loop3=0

for /f %%a in ('dir /b /a-d /s %input%') do (
    CALL jhove -m PDF-hul -h xml -o %output%\%%~na.xml %%a
    if !ERRORLEVEL! EQU 0 (echo Errorlevel equals !errorlevel! )
    if !ERRORLEVEL! GEQ 1 (Errorlevel equals !errorlevel! )
    set /a loop3+=1
)

The output of the script on running in directory C:\Users\NekhayenkoO\jhove-beta:

Setlocal EnableDelayedExpansion
set "input=C:\Users\NekhayenkoO\test\"**
set "output=C:\Users\NekhayenkoO\outputxml\"**
set string1=Well-Formed and valid
set string2=Well-Formed, but not valid
set string3=Not well-formed
set /a loop=0
set /a loop1=0
set /a loop2=0
set /a loop3=0
for /F %a in ('dir /b /a-d /s "C:\Users\NekhayenkoO\test\"') do (
echo Verarbeite %~na
 CALL jhove -m PDF-hul -h xml -o "C:\Users\NekhayenkoO\outputxml\\%~na.xml" "%a"
 if !ERRORLEVEL! EQU 0 (echo Errorlevel equals !errorlevel! )
 if !ERRORLEVEL! GEQ 1 (Errorlevel equals !errorlevel! )
 set /a loop3+=1
)

(
echo Verarbeite 757419577
 CALL jhove -m PDF-hul -h xml -o "C:\Users\NekhayenkoO\outputxml\\757419577.xml" "C:\Users\NekhayenkoO\test\757419577.pdf"
 if !ERRORLEVEL! EQU 0 (echo Errorlevel equals !errorlevel! )
 if !ERRORLEVEL! GEQ 1 (Errorlevel equals !errorlevel! )
 set /a loop3+=1
)
Verarbeite 757419577
Errorlevel equals 0
Verarbeite GBV58575165X
Errorlevel equals 0
Verarbeite GBV85882115X
java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictiona
        at edu.harvard.hul.ois.jhove.module.PdfModule.readDocCatalogDict(PdfModule.java:1344)
        at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:521)
        at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:803)
        at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:588)
        at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:455)
        at Jhove.main(Jhove.java:292)
Errorlevel equals 0
Verarbeite GBV858852357
Errorlevel equals 0
Verarbeite nicht_valide_PDF
Errorlevel equals 0
Verarbeite not_Wellformed_intern
Errorlevel equals 0
Verarbeite pp1788_text
Errorlevel equals 0
Verarbeite Rosetta_Testdatei
Errorlevel equals 0
Verarbeite script
Errorlevel equals 0
Verarbeite java
Errorlevel equals 0
Verarbeite java
Errorlevel equals 0
Verarbeite java
Errorlevel equals 0
Verarbeite java
Errorlevel equals 0
Verarbeite GBV58525785X
Errorlevel equals 0
Verarbeite GBV58574517X
Errorlevel equals 0
Drücken Sie eine beliebige Taste . . .

Solution

  • What is jhove?

    Oleg Nekhayenko, you have asked several jhove related questions in the last days, but you have always forgotten to explain what jhove is which is important to know for all of your questions.

    Therefore I searched in world wide web for jhove, found very quickly the homepage
    JHOVE | JSTOR/Harvard Object Validation Environment, read quickly its documentation and command-line interface description and finally downloaded also jhove-1_11.zip from SourceForge project page of JHOVE.

    All this was done by me to find out that jhove is a Java application which is executed on Linux and perhaps also on Mac using the shell script jhove and on Windows the batch file jhove.bat for making it easier to use by users.

    You could have saved yourself and all readers of your questions a lot of time if you would have written jhove.bat instead of just jhove in your code snippets or at least mentioned anywhere that jhove is a batch file.

    Assigning a value/string to an environment variable

    I suggest to read first the answer on
    Why is no string output with 'echo %var%' after using 'set var = text' on command line?
    and next look on these two lines:

    set "input=C:\Users\NekhayenkoO\test\"**
    set "output=C:\Users\NekhayenkoO\outputxml\"**
    

    I don't know why two asterisks are at end of those 2 command lines. But that does not really matter as both asterisk are ignored on assigning the two paths to the two environment variables.

    This can be seen on posted output of the batch file as there is no asterisk output on the lines:

    for /F %a in ('dir /b /a-d /s "C:\Users\NekhayenkoO\test\"') do (
    
    CALL jhove -m PDF-hul -h xml -o "C:\Users\NekhayenkoO\outputxml\\757419577.xml" "C:\Users\NekhayenkoO\test\757419577.pdf"
    

    There is no asterisk anywhere. So the environment variables input and output are obviously defined without the asterisks at end which is even good here.

    Enclosing directory and file names in double quotes

    The help output on running cmd /? in a command prompt window explains in last paragraph on last help page on which characters in a directory or file name double quotes must be used around complete directory/file name.

    The space character is the string delimiting character on command line and therefore a directory or file name with a space must be always enclosed in double quotes.

    Predefined environment variables on Windows

    Opening a command prompt window and running set results in output of all environment variables defined for the the current user account including PATH and PATHEXT as also USERNAME and USERPROFILE.

    The Wikipedia article about Windows Environment Variables explains the environment variables predefined by Windows. It is advisable to make use of them in batch files.

    Execution of applications and scripts on Windows

    If in a command prompt window or in a batch file just the file name of an application or script without file extension and without path is specified, the Windows command interpreter is searching first in current directory and next in all directories of environment variable PATH for a file with specified name having a file extension listed in environment variable PATHEXT. In this case Windows command interpreter is searching for jhove.*.

    The values of the environment variables PATH and PATHEXT can be seen on opening a command prompt window and running in this window set path which results in output of all environment variables starting with the case-insensitive interpreted string PATH with their current values.

    Next to know is that when Windows command interpreter searches for jhove.*, the NTFS file system returns the file names matching this search pattern sorted alphabetically. So if current directory or any of the directories listed in PATH have for example jhove.bat and jhove.exe, the NTFS file system returns first jhove.bat. This batch file is used by Windows command interpreter as file extension BAT is listed by default in PATHEXT.

    But if the file system of the drive with jhove.* files is FAT, FAT32 or ExFat, the file system returns the file names matching the search pattern in order as stored in the file allocation table and therefore unsorted. So in case of a directory contains jhove.bat and jhove.exe on a drive with any FAT file system, it is unpredictable which file is executed by Windows command interpreter on specifying just jhove in a batch file.

    For that reason it is always advisable to specify the application or script with file name and at least also with the file extension. And if possible the entire path to the application to run or the script to call should be also specified.

    The Windows command interpreter does not need to search around by specifying the name of an application or script file with file extension and with complete path.

    See also answer on Where is "START" searching for executables?

    Calling a batch file versus running an application

    A batch file is a script (text file) interpreted by Windows command interpreter line by line whereby a command block starting with ( and ending with matching ) is interpreted like a subroutine defined on one line.

    An application is an executable (binary file) compiled with a compiler for a specific processor or processor family and therefore does not need to be interpreted anymore on execution. It contains already processor instructions (machine code).

    Why the command call must be used to run another batch file from within a batch file is explained in detail by the answers on

    For that reason it is very important to know what jhove is. It is a batch file and must be therefore called with command call which answers the question How to process 2 for loops after each other in batch?

    For help on command call open a command prompt window and run call /?. The output help explains also which placeholders exist to reference arguments of the batch file whereby argument 0 is the name of the batch file.

    Which command lines contains jhove.bat?

    On unexpected behavior on calling a batch file from another batch file it is important to know the code of the called batch file as well because the error could be in code of called batch file.

    Code of jhove.bat as stored in jhove-1_11.zip without instruction comments:

    @ECHO OFF
    SET JHOVE_HOME=%~dp0
    
    SET EXTRA_JARS=
    
    REM NOTE: Nothing below this line should be edited
    REM #########################################################################
    
    
    SET CP=%JHOVE_HOME%\bin\JhoveApp.jar
    IF "%EXTRA_JARS%"=="" GOTO FI
      SET CP=%CP%:%EXTRA_JARS
    :FI
    
    REM Retrieve a copy of all command line arguments to pass to the application
    
    SET ARGS=
    :WHILE
    IF %1x==x GOTO LOOP
      SET ARGS=%ARGS% %1
      SHIFT
      GOTO WHILE
    :LOOP
    
    
    REM Set the CLASSPATH and invoke the Java loader
    java -classpath %CP% Jhove %ARGS%
    

    Well, this is a not good written batch code for following reasons:

    1. The commands setlocal and endlocal are not used in batch file to control the life time of variables used by this batch file. See answer on change directory command cd ..not working in batch file after npm install for more details. npm.bat is also a not good coded batch file like jhove.bat as it turned out.

    2. The command line SET JHOVE_HOME=%~dp0 defines the environment variable JHOVE_HOME with drive and path of storage location of jhove.bat. The path returned by %~dp0 ends always with a backslash. If jhove*.zip was extracted into a directory with 1 or more space in complete path, care must be taken wherever JHOVE_HOME is finally used to enclose the final string in double quotes.

      The command line SET CP=%JHOVE_HOME%\bin\JhoveApp.jar defines the environment variable CP by concatenating path to batch file jhove.bat with a fixed path and name of the Java package. Here is already a small mistake as %~dp0 is a path always ending with a backlash concatenated with a string starting with a backslash. So there are two backslashes finally in path to the Java package file. But Windows kernel handles this error in path and therefore it does not really matter.

      The environment variable CP is referenced unmodified with no EXTRA_JARS defined by the user finally on command line java -classpath %CP% Jhove %ARGS%. The error here is %CP% is specified without being enclosed in double quotes which results in unexpected behavior if jhove*.zip was extracted indeed by the user into a directory with 1 or more spaces in complete path.

    3. A percent sign is missing at end of command line SET CP=%CP%:%EXTRA_JARS.

    4. The writer of jhove.bat did not know obviously anything about %* which on usage of last command line instead of %ARGS% makes the WHILE loop above completely useless.

    Much better for jhove.bat would be:

    @echo off
    setlocal EnableExtensions
    set "JHOVE_HOME=%~dp0"
    
    set "EXTRA_JARS="
    
    REM NOTE: Nothing below this line should be edited
    REM #########################################################################
    
    set "CP=%JHOVE_HOME%bin\JhoveApp.jar"
    if not "%EXTRA_JARS%"=="" set "CP=%CP%:%EXTRA_JARS%"
    
    rem Set the CLASSPATH and invoke the Java loader
    java.exe -classpath "%CP%" Jhove %*
    endlocal
    

    The executable java.exe must be found via environment variable PATH by Windows command interpreter.

    Final batch code for usage

    I suggest to use the following code for this task in case of jhove.bat should not be modified to above working code:

    @echo off
    setlocal EnableExtensions
    set "InputFolder=%USERPROFILE%\test"
    set "OutputFolder=%USERPROFILE%\outputxml"
    
    echo Searching for bin\JhoveApp.jar in:
    echo.
    set "SearchPath=%CD%;%PATH%"
    set "SearchPath=%SearchPath:)=^)%"
    for /F "delims=" %%I in ('echo %SearchPath:;=^&ECHO %') do (
        echo    %%I
        if exist "%%~I\bin\JhoveApp.jar" (
            set "JHOVE_HOME=%%~I"
            goto RunJHOVE
        )
    )
    echo.
    echo Error reported by %~f0:
    echo.
    echo Could not find bin\JhoveApp.jar in current directory and folders of PATH.
    echo.
    endlocal
    pause
    goto :EOF
    
    :RunJHOVE
    if "%JHOVE_HOME:~-1%" == "\" (
        set "CP=%JHOVE_HOME%bin\JhoveApp.jar"
    ) else (
        set "CP=%JHOVE_HOME%\bin\JhoveApp.jar"
    )
    echo.
    echo Using %CP%
    
    md "%OutputFolder%" 2>nul
    
    rem for /F %%I in ('dir /A-D /B /S "%InputFolder%\*" 2^>nul') do (
    rem     java.exe -classpath "%CP%" Jhove -m PDF-hul -h xml -o "%OutputFolder%\%%~nI.xml" "%%I"
    rem )
    
    for /R "%InputFolder%" %%I in (*) do (
        java.exe -classpath "%CP%" Jhove -m PDF-hul -h xml -o "%OutputFolder%\%%~nI.xml" "%%I"
    )
    
    endlocal
    

    The input and output folder paths are defined without backslash at end and without asterisk using predefined environment variable USERPROFILE.

    A slightly modified code written by Magoo in his answer on Find the path used by the command line when calling an executable is used to find Java package of JHOVE. The batch file prints the folders it is searching for in case of the file could not be found which results in an error message and halting batch execution until the user presses any key.

    The class path variable CP is created with taking into account if folder path ends with a backslash or not. Folder paths in PATH should be defined without backslash at end, but there are always installers which add folder paths not 100% correct to PATH. However, it does not really matter if the result would be \\ anywhere within a path as Windows kernel handles this. That's the reason why if exist "%%~I\bin\JhoveApp.jar" also always works although this file existence test could be also done with two backslashes in path depending on folder path in PATH.

    Next the output folder is created without checking first if the folder is already existing and without checking if folder creation was successful at all.

    The batch code contains two solutions for running jhove on each file found recursively in input folder path. The first one is commented out. It would have the advantage to work also for hidden and system files. The second solution does not work for hidden and system files, but this is most likely not necessary here. The second solution is therefore the preferred one.

    For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.

    • echo /?
    • endlocal /?
    • for /?
    • goto /?
    • if /?
    • md /?
    • pause /?
    • set /?
    • setlocal /?

    And read also the Microsoft articles: