Search code examples
batch-filetesseract

Tesseract-OCR to batch files in all subfolders, Windows cmd?


I am trying to use Tesseract-OCR to read and OCR all .png files, not only in current folder, (as there is answer for that) but also in all subfolders. This works for folder:

for %%A in ("C:\Users\x\AppData\Local\Tesseract-OCR\temp\*.png") do C:\Users\x\AppData\Local\Tesseract-OCR\tesseract.exe "%%~fA" "%%~dpnxA"

I tried with this to go through all subfolders that I have in "temp" folder:

(for /r %%a in (*.png) do C:\Users\x\AppData\Local\Tesseract-OCR\tesseract.exe "%%~nxa" "%%~dpnxA")

but I got this errors for every file:

C:\Users\x\AppData\Local\Tesseract-OCR\temp>C:\Users\x\AppData\Local\Tesseract-OCR\tesseract.exe "01.png" "%~dpnxA"
Tesseract Open Source OCR Engine v4.1.0-elag2019 with Leptonica
Error, cannot read input file 01.png: No such file or directory
Error during processing.

It is obvious that the script finds all files in all of the subfolders, but then it cant read then for some reason?

Also, this script works for one folder, but when I try to use with /r it doesnt go through all subfolders:

:Start
   @Echo off
   Set _SourcePath=C:\Users\x\AppData\Local\Tesseract-OCR\temp\*.png
   Set _OutputPath=C:\Users\x\AppData\Local\Tesseract-OCR\temp\
   Set _Tesseract="C:\Users\x\AppData\Local\Tesseract-OCR\tesseract.exe"
:Convert
   For %%A in (%_SourcePath%) Do Echo Converting %%A...&%_Tesseract% %%A %_OutputPath%%%~nA 
:End   
   Set "_SourcePath="
   Set "_OutputPath="
   Set "_Tesseract="

Any ideas?


Solution

  • Perhaps this sort of thing is what you're looking for:

    @Echo Off
    SetLocal DisableDelayedExpansion
    
    Set "_SourcePath=%LocalAppData%\Tesseract-OCR\temp"
    Set "_SourceMask=*.png"
    Set "_OutputPath=%LocalAppData%\Tesseract-OCR\temp"
    Set "_TesserFile=%LocalAppData%\Tesseract-OCR\tesseract.exe"
    
    For /F "Delims=" %%A In (
        '""%__AppDir__%where.exe" /R "%_SourcePath%" "%_SourceMask%" 2>Nul"'
    ) Do Echo Converting %%A...& "%_TesserFile%" "%%A" "%_OutputPath%\%%~nA"
    

    Note, this assumes that allows for specifying the output directory and accepts doublequoted strings etc. It also assumes that you intend for all output files to be placed in %_OutputPath%.

    If you wanted them to be placed along side their respective .png's then perhaps this will do it:

    @Echo Off
    SetLocal DisableDelayedExpansion
    
    Set "_SourcePath=%LocalAppData%\Tesseract-OCR\temp"
    Set "_SourceMask=*.png"
    Set "_TesserFile=%LocalAppData%\Tesseract-OCR\tesseract.exe"
    
    For /F "Delims=" %%A In (
        '""%__AppDir__%where.exe" /R "%_SourcePath%" "%_SourceMask%" 2>Nul"'
    ) Do Echo Converting %%A...& "%_TesserFile%" "%%A" "%%~nA"