Search code examples
vb.netwinformsfile-iovisual-studio-2013tesseract

How to check if Tesseract has finished processing a file?


I am just programming a software in vb.net where I try to OCR dozens of *.jpg files.

The basic idea is to manually select a folder where I have a bunch of jpg files and a second folder where txt files that Tesseract outputs, are stored.

As you know, Tesseract takes some seconds (in my case a little bit more because my computer is not fast) to process the jpg file and OCR it.

The problem is I want to OCR each jpg one by one so I need to know when Tesseract has finished processing each file. As fast I execute the CMD command with the arguments, Tesseract created an empty txt file. But I have no idea about how to check when Tesseract has finished to process the file and the VB software can launch the instructions to process the following jpg.

I have thought about checking the length in bytes of the txt file and if it's not zero, it means that the file has been processed by Tesseract.

At the moment I have a Do...Loop where I process each of jpg files and I have a nested Do...Loop that checks if txt file size is > 0 bytes. In case that is not bigger than zero bytes, it executes thread.sleep(5000).

Do Until myFileSize > 0
    Thread.Sleep(5000)
Loop

Trying to sleep the code again and again while txt file size = 0 bytes.

It's the only solution I know, but it seems it doesn't performs the action I am looking for.

Which technique would you use to solve this case?


Solution

  • Tesseract has batch mode where you can provide list of files that has to be processed and it will process each and every one of them. Have a look here.