Search code examples
tesseractocrmypdf

OCRmyPDF - Wired error message from tesseract


I get a strange error message when running a OCRmyPDF command

My setup:

  • macOS Sequoia 15.2
  • OCRmyPDF 16.8.0 (installed by Brew)
  • tesseract 5.5.0 (installed by Brew)
  • Command: ocrmypdf -l deu+fra+eng --clean --force-ocr test.pdf test-out.pdf 2>> debugOCR.txt

I have to say that the command is triggered by the software NoodleSoft Hazel, and as far as i understand Hazel executes the shell commands in a dedicated environment. However, my setup worked fine for a few weeks, but within the processing of a batch of PDF files, the following error started to occur. Since then I was not able to bring it back to work.

The debug file debugOCR.txt shows the following error:

1 [tesseract] Error in fopenReadStream: failed to open locally with tail 000001_ocr.png for filename /tmp/ocrmypdf.io.81a_o2mw/000001_ocr.png
1 [tesseract] Leptonica Error in findFileFormat: image file not found: /tmp/ocrmypdf.io.81a_o2mw/000001_ocr.png
1 [tesseract] Error in fopenReadStream: failed to open locally with tail PNG for filename PNG
1 [tesseract] Leptonica Error in pixRead: image file not found: PNG
1 [tesseract] Image file PNG cannot be read!
1 [tesseract] Error during processing.
SubprocessOutputError

In the folder /tmp i can't find any subfolder like /tmp/ocrmypdf.io.81a_o2mw/.

I also have to mention that when executing the following commands directly in Apple Terminal, they work fine:

ocrmypdf -l deu+fra+eng --clean --force-ocr test.pdf test-out.pdf 2>> debugOCR.txt
tesseract test.tiff output --oem 1 -l eng pdf 

Any hints where I have to dig deeper? Is ocrmypdf or tesseract missing some environment variables in the Hazel environment? Other hints?

Thanks a lot

AJ


Solution

  • https://github.com/tesseract-ocr/tesseract/issues/4333

    This is likely the issue.

    I faced the same while using wcgw mcp which also has a separate terminal evironment.

    Setting TMPDIR to //tmp helped me.