Search code examples

php exec() and tesseract goes ''Cannot open input file'

I use Ghostscript to strip images from PDF files into jpg and run Tesseract to save txt content like this:

  • Ghostscript located in c:\engine\gs\
  • Tesseract located in c:\engine\tesseract\
  • web located pdf/jpg/txt dir = file/tmp/


$pathgs = "c:\\engine\\gs\\";
$pathtess = "c:\\engine\\tesseract\\";
$pathfile = "file/tmp/"

// Strip images
$exec = "gs -dNOPAUSE -sDEVICE=jpeg -r300 -sOutputFile=".$pathfile."strip%d.jpg ".$pathfile."upload.pdf -q -c quit";

// OCR
$exec = "tesseract.exe '".$pathfile."strip1.jpg' '".$pathfile."ocr' -l eng";
exec($exec, $msg);
echo file_get_contents($pathfile."ocr.txt");

Stripping the image (its just 1 page) works fine, but Tesseract echoes:

    [0] => Tesseract Open Source OCR Engine v3.01 with Leptonica
    [1] => Cannot open input file: 'file/tmp/strip1.jpg'

and no ocr.txt file is generated, thus leading into a 'failed to open stream' error in PHP.

  • Copying strip1.jpg into c:/engine/tesseract/ folder and running Tesseract from command (tesseract strip1.jpg ocr.txt -l eng) runs without any issue.
  • Replacing the putenv() quote by exec(c:/engine/tesseract/tesseract ... ) returns the a.m. error
  • I kept strip1.jpg in the Tesseract folder and ran exec(tesseract 'c:/engine/tesseract/strip1.jpg' ... ) returns the a.m. error
  • Leaving away the apostrophs around path/strip1.jpg returns an empty array as message and does not create the ocr.txt file.
  • writing the command directly into the exec() quote instead of using $exec doesn't make the change.

What am I doing wrong?


  • Perhaps the missing environment variables in PHP is the problem here. Have a look at my question here to see if setting HOME or PATH sorts this out?