I was trying to create a searchable pdf using tesseract in python but got this error also im using a non-root ubuntu user
FileNotFoundError Traceback (most recent call last)
<ipython-input-8-fd512bf1bdc4> in <module>
----> 1 pdf = pt.image_to_pdf_or_hocr(image,lang='hin',extension='pdf')
2 with open('test.pdf', 'w+b') as f:
3 f.write(pdf)
~/hindi_machine_readable/hindi_ocr/lib/python3.6/site-packages/pytesseract/pytesseract.py in image_to_pdf_or_hocr(image, lang, config, nice, extension, timeout)
434 args = [image, extension, lang, config, nice, timeout, True]
435
--> 436 return run_and_get_output(*args)
437
438
~/hindi_machine_readable/hindi_ocr/lib/python3.6/site-packages/pytesseract/pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
284 run_tesseract(**kwargs)
285 filename = kwargs['output_filename_base'] + extsep + extension
--> 286 with open(filename, 'rb') as output_file:
287 if return_bytes:
288 return output_file.read()
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tess_5il97yg3.pdf'
It works fine when calling image_to_string
I Solved the issue i forgot to install libtesseract-dev also since i am not a root user, i used
apt-get download libtesseract-dev
dpkg -x <debian_file> .
this created the /usr in the pwd thus by copying the /usr to the virtualenv and setting the TESSDATA_PREFFIX env to the correct path it worked