I am just programming a software in vb.net where I try to OCR dozens of *.jpg files.
The basic idea is to manually select a folder where I have a bunch of jpg files and a second folder where txt files that Tesseract outputs, are stored.
As you know, Tesseract takes some seconds (in my case a little bit more because my computer is not fast) to process the jpg file and OCR it.
The problem is I want to OCR each jpg one by one so I need to know when Tesseract has finished processing each file. As fast I execute the CMD command with the arguments, Tesseract created an empty txt file. But I have no idea about how to check when Tesseract has finished to process the file and the VB software can launch the instructions to process the following jpg.
I have thought about checking the length in bytes of the txt file and if it's not zero, it means that the file has been processed by Tesseract.
At the moment I have a Do...Loop where I process each of jpg files and I have a nested Do...Loop that checks if txt file size is > 0 bytes. In case that is not bigger than zero bytes, it executes thread.sleep(5000).
Do Until myFileSize > 0
Thread.Sleep(5000)
Loop
Trying to sleep the code again and again while txt file size = 0 bytes.
It's the only solution I know, but it seems it doesn't performs the action I am looking for.
Which technique would you use to solve this case?
Tesseract has batch mode where you can provide list of files that has to be processed and it will process each and every one of them. Have a look here.