In my code I'm searching for only files in folder and all subfolders. When the name of subfolder has one blank (space) between the words this subfolder is recognized as a file, too. This is not correct behavior. The parameter /a-d
doesn't help here.
@echo on
Setlocal EnableDelayedExpansion
set "input=C:\Users\NekhayenkoO\test\"**
set "output=C:\Users\NekhayenkoO\outputxml\"**
set string1=Well-Formed and valid
set string2=Well-Formed, but not valid
set string3=Not well-formed
set /a loop=0
set /a loop1=0
set /a loop2=0
set /a loop3=0
for /f %%a in ('dir /b /a-d /s %input%') do (
CALL jhove -m PDF-hul -h xml -o %output%\%%~na.xml %%a
if !ERRORLEVEL! EQU 0 (echo Errorlevel equals !errorlevel! )
if !ERRORLEVEL! GEQ 1 (Errorlevel equals !errorlevel! )
set /a loop3+=1
)
The output of the script on running in directory C:\Users\NekhayenkoO\jhove-beta
:
Setlocal EnableDelayedExpansion
set "input=C:\Users\NekhayenkoO\test\"**
set "output=C:\Users\NekhayenkoO\outputxml\"**
set string1=Well-Formed and valid
set string2=Well-Formed, but not valid
set string3=Not well-formed
set /a loop=0
set /a loop1=0
set /a loop2=0
set /a loop3=0
for /F %a in ('dir /b /a-d /s "C:\Users\NekhayenkoO\test\"') do (
echo Verarbeite %~na
CALL jhove -m PDF-hul -h xml -o "C:\Users\NekhayenkoO\outputxml\\%~na.xml" "%a"
if !ERRORLEVEL! EQU 0 (echo Errorlevel equals !errorlevel! )
if !ERRORLEVEL! GEQ 1 (Errorlevel equals !errorlevel! )
set /a loop3+=1
)
(
echo Verarbeite 757419577
CALL jhove -m PDF-hul -h xml -o "C:\Users\NekhayenkoO\outputxml\\757419577.xml" "C:\Users\NekhayenkoO\test\757419577.pdf"
if !ERRORLEVEL! EQU 0 (echo Errorlevel equals !errorlevel! )
if !ERRORLEVEL! GEQ 1 (Errorlevel equals !errorlevel! )
set /a loop3+=1
)
Verarbeite 757419577
Errorlevel equals 0
Verarbeite GBV58575165X
Errorlevel equals 0
Verarbeite GBV85882115X
java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictiona
at edu.harvard.hul.ois.jhove.module.PdfModule.readDocCatalogDict(PdfModule.java:1344)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:521)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:803)
at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:588)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:455)
at Jhove.main(Jhove.java:292)
Errorlevel equals 0
Verarbeite GBV858852357
Errorlevel equals 0
Verarbeite nicht_valide_PDF
Errorlevel equals 0
Verarbeite not_Wellformed_intern
Errorlevel equals 0
Verarbeite pp1788_text
Errorlevel equals 0
Verarbeite Rosetta_Testdatei
Errorlevel equals 0
Verarbeite script
Errorlevel equals 0
Verarbeite java
Errorlevel equals 0
Verarbeite java
Errorlevel equals 0
Verarbeite java
Errorlevel equals 0
Verarbeite java
Errorlevel equals 0
Verarbeite GBV58525785X
Errorlevel equals 0
Verarbeite GBV58574517X
Errorlevel equals 0
Drücken Sie eine beliebige Taste . . .
Oleg Nekhayenko, you have asked several jhove
related questions in the last days, but you have always forgotten to explain what jhove
is which is important to know for all of your questions.
Therefore I searched in world wide web for jhove
, found very quickly the homepage
JHOVE | JSTOR/Harvard Object Validation Environment, read quickly its documentation and command-line interface description and finally downloaded also jhove-1_11.zip from SourceForge project page of JHOVE.
All this was done by me to find out that jhove
is a Java application which is executed on Linux and perhaps also on Mac using the shell script jhove
and on Windows the batch file jhove.bat
for making it easier to use by users.
You could have saved yourself and all readers of your questions a lot of time if you would have written jhove.bat
instead of just jhove
in your code snippets or at least mentioned anywhere that jhove
is a batch file.
I suggest to read first the answer on
Why is no string output with 'echo %var%' after using 'set var = text' on command line?
and next look on these two lines:
set "input=C:\Users\NekhayenkoO\test\"**
set "output=C:\Users\NekhayenkoO\outputxml\"**
I don't know why two asterisks are at end of those 2 command lines. But that does not really matter as both asterisk are ignored on assigning the two paths to the two environment variables.
This can be seen on posted output of the batch file as there is no asterisk output on the lines:
for /F %a in ('dir /b /a-d /s "C:\Users\NekhayenkoO\test\"') do (
CALL jhove -m PDF-hul -h xml -o "C:\Users\NekhayenkoO\outputxml\\757419577.xml" "C:\Users\NekhayenkoO\test\757419577.pdf"
There is no asterisk anywhere. So the environment variables input
and output
are obviously defined without the asterisks at end which is even good here.
The help output on running cmd /?
in a command prompt window explains in last paragraph on last help page on which characters in a directory or file name double quotes must be used around complete directory/file name.
The space character is the string delimiting character on command line and therefore a directory or file name with a space must be always enclosed in double quotes.
Opening a command prompt window and running set
results in output of all environment variables defined for the the current user account including PATH
and PATHEXT
as also USERNAME
and USERPROFILE
.
The Wikipedia article about Windows Environment Variables explains the environment variables predefined by Windows. It is advisable to make use of them in batch files.
If in a command prompt window or in a batch file just the file name of an application or script without file extension and without path is specified, the Windows command interpreter is searching first in current directory and next in all directories of environment variable PATH
for a file with specified name having a file extension listed in environment variable PATHEXT
. In this case Windows command interpreter is searching for jhove.*
.
The values of the environment variables PATH
and PATHEXT
can be seen on opening a command prompt window and running in this window set path
which results in output of all environment variables starting with the case-insensitive interpreted string PATH
with their current values.
Next to know is that when Windows command interpreter searches for jhove.*
, the NTFS file system returns the file names matching this search pattern sorted alphabetically. So if current directory or any of the directories listed in PATH
have for example jhove.bat
and jhove.exe
, the NTFS file system returns first jhove.bat
. This batch file is used by Windows command interpreter as file extension BAT
is listed by default in PATHEXT
.
But if the file system of the drive with jhove.*
files is FAT, FAT32 or ExFat, the file system returns the file names matching the search pattern in order as stored in the file allocation table and therefore unsorted. So in case of a directory contains jhove.bat
and jhove.exe
on a drive with any FAT file system, it is unpredictable which file is executed by Windows command interpreter on specifying just jhove
in a batch file.
For that reason it is always advisable to specify the application or script with file name and at least also with the file extension. And if possible the entire path to the application to run or the script to call should be also specified.
The Windows command interpreter does not need to search around by specifying the name of an application or script file with file extension and with complete path.
See also answer on Where is "START" searching for executables?
A batch file is a script (text file) interpreted by Windows command interpreter line by line whereby a command block starting with (
and ending with matching )
is interpreted like a subroutine defined on one line.
An application is an executable (binary file) compiled with a compiler for a specific processor or processor family and therefore does not need to be interpreted anymore on execution. It contains already processor instructions (machine code).
Why the command call must be used to run another batch file from within a batch file is explained in detail by the answers on
For that reason it is very important to know what jhove
is. It is a batch file and must be therefore called with command call which answers the question How to process 2 for loops after each other in batch?
For help on command call open a command prompt window and run call /?
. The output help explains also which placeholders exist to reference arguments of the batch file whereby argument 0 is the name of the batch file.
On unexpected behavior on calling a batch file from another batch file it is important to know the code of the called batch file as well because the error could be in code of called batch file.
Code of jhove.bat
as stored in jhove-1_11.zip
without instruction comments:
@ECHO OFF
SET JHOVE_HOME=%~dp0
SET EXTRA_JARS=
REM NOTE: Nothing below this line should be edited
REM #########################################################################
SET CP=%JHOVE_HOME%\bin\JhoveApp.jar
IF "%EXTRA_JARS%"=="" GOTO FI
SET CP=%CP%:%EXTRA_JARS
:FI
REM Retrieve a copy of all command line arguments to pass to the application
SET ARGS=
:WHILE
IF %1x==x GOTO LOOP
SET ARGS=%ARGS% %1
SHIFT
GOTO WHILE
:LOOP
REM Set the CLASSPATH and invoke the Java loader
java -classpath %CP% Jhove %ARGS%
Well, this is a not good written batch code for following reasons:
The commands setlocal and endlocal are not used in batch file to control the life time of variables used by this batch file. See answer on change directory command cd ..not working in batch file after npm install for more details. npm.bat
is also a not good coded batch file like jhove.bat
as it turned out.
The command line SET JHOVE_HOME=%~dp0
defines the environment variable JHOVE_HOME
with drive and path of storage location of jhove.bat
. The path returned by %~dp0
ends always with a backslash. If jhove*.zip
was extracted into a directory with 1 or more space in complete path, care must be taken wherever JHOVE_HOME
is finally used to enclose the final string in double quotes.
The command line SET CP=%JHOVE_HOME%\bin\JhoveApp.jar
defines the environment variable CP
by concatenating path to batch file jhove.bat
with a fixed path and name of the Java package. Here is already a small mistake as %~dp0
is a path always ending with a backlash concatenated with a string starting with a backslash. So there are two backslashes finally in path to the Java package file. But Windows kernel handles this error in path and therefore it does not really matter.
The environment variable CP
is referenced unmodified with no EXTRA_JARS
defined by the user finally on command line java -classpath %CP% Jhove %ARGS%
. The error here is %CP%
is specified without being enclosed in double quotes which results in unexpected behavior if jhove*.zip
was extracted indeed by the user into a directory with 1 or more spaces in complete path.
A percent sign is missing at end of command line SET CP=%CP%:%EXTRA_JARS
.
The writer of jhove.bat
did not know obviously anything about %*
which on usage of last command line instead of %ARGS%
makes the WHILE
loop above completely useless.
Much better for jhove.bat
would be:
@echo off
setlocal EnableExtensions
set "JHOVE_HOME=%~dp0"
set "EXTRA_JARS="
REM NOTE: Nothing below this line should be edited
REM #########################################################################
set "CP=%JHOVE_HOME%bin\JhoveApp.jar"
if not "%EXTRA_JARS%"=="" set "CP=%CP%:%EXTRA_JARS%"
rem Set the CLASSPATH and invoke the Java loader
java.exe -classpath "%CP%" Jhove %*
endlocal
The executable java.exe
must be found via environment variable PATH
by Windows command interpreter.
I suggest to use the following code for this task in case of jhove.bat
should not be modified to above working code:
@echo off
setlocal EnableExtensions
set "InputFolder=%USERPROFILE%\test"
set "OutputFolder=%USERPROFILE%\outputxml"
echo Searching for bin\JhoveApp.jar in:
echo.
set "SearchPath=%CD%;%PATH%"
set "SearchPath=%SearchPath:)=^)%"
for /F "delims=" %%I in ('echo %SearchPath:;=^&ECHO %') do (
echo %%I
if exist "%%~I\bin\JhoveApp.jar" (
set "JHOVE_HOME=%%~I"
goto RunJHOVE
)
)
echo.
echo Error reported by %~f0:
echo.
echo Could not find bin\JhoveApp.jar in current directory and folders of PATH.
echo.
endlocal
pause
goto :EOF
:RunJHOVE
if "%JHOVE_HOME:~-1%" == "\" (
set "CP=%JHOVE_HOME%bin\JhoveApp.jar"
) else (
set "CP=%JHOVE_HOME%\bin\JhoveApp.jar"
)
echo.
echo Using %CP%
md "%OutputFolder%" 2>nul
rem for /F %%I in ('dir /A-D /B /S "%InputFolder%\*" 2^>nul') do (
rem java.exe -classpath "%CP%" Jhove -m PDF-hul -h xml -o "%OutputFolder%\%%~nI.xml" "%%I"
rem )
for /R "%InputFolder%" %%I in (*) do (
java.exe -classpath "%CP%" Jhove -m PDF-hul -h xml -o "%OutputFolder%\%%~nI.xml" "%%I"
)
endlocal
The input and output folder paths are defined without backslash at end and without asterisk using predefined environment variable USERPROFILE
.
A slightly modified code written by Magoo in his answer on Find the path used by the command line when calling an executable is used to find Java package of JHOVE. The batch file prints the folders it is searching for in case of the file could not be found which results in an error message and halting batch execution until the user presses any key.
The class path variable CP
is created with taking into account if folder path ends with a backslash or not. Folder paths in PATH
should be defined without backslash at end, but there are always installers which add folder paths not 100% correct to PATH
. However, it does not really matter if the result would be \\
anywhere within a path as Windows kernel handles this. That's the reason why if exist "%%~I\bin\JhoveApp.jar"
also always works although this file existence test could be also done with two backslashes in path depending on folder path in PATH
.
Next the output folder is created without checking first if the folder is already existing and without checking if folder creation was successful at all.
The batch code contains two solutions for running jhove
on each file found recursively in input folder path. The first one is commented out. It would have the advantage to work also for hidden and system files. The second solution does not work for hidden and system files, but this is most likely not necessary here. The second solution is therefore the preferred one.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
echo /?
endlocal /?
for /?
goto /?
if /?
md /?
pause /?
set /?
setlocal /?
And read also the Microsoft articles: