Search code examples
python-3.xpython-tesseract

FileNotFoundError: error while using image_to_pdf_or_hocr


I was trying to create a searchable pdf using tesseract in python but got this error also im using a non-root ubuntu user

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-8-fd512bf1bdc4> in <module>
----> 1 pdf = pt.image_to_pdf_or_hocr(image,lang='hin',extension='pdf')
      2 with open('test.pdf', 'w+b') as f:
      3     f.write(pdf)

~/hindi_machine_readable/hindi_ocr/lib/python3.6/site-packages/pytesseract/pytesseract.py in image_to_pdf_or_hocr(image, lang, config, nice, extension, timeout)
    434     args = [image, extension, lang, config, nice, timeout, True]
    435 
--> 436     return run_and_get_output(*args)
    437 
    438 

~/hindi_machine_readable/hindi_ocr/lib/python3.6/site-packages/pytesseract/pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
    284         run_tesseract(**kwargs)
    285         filename = kwargs['output_filename_base'] + extsep + extension
--> 286         with open(filename, 'rb') as output_file:
    287             if return_bytes:
    288                 return output_file.read()

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tess_5il97yg3.pdf'

It works fine when calling image_to_string


Solution

  • I Solved the issue i forgot to install libtesseract-dev also since i am not a root user, i used

    apt-get download libtesseract-dev

    dpkg -x <debian_file> .

    this created the /usr in the pwd thus by copying the /usr to the virtualenv and setting the TESSDATA_PREFFIX env to the correct path it worked