Search code examples
pythonwindowstesseractpython-tesseract

Tesserract.exe fails with Pytesseract


I have a Python project that uses pytesserract to apply OCR to an image and get the text from it. I have compiled this project with pyinstaller and the project works fine in my local, in a windows sandbox environment and in a windows 2012 server machine but when I deployed it to production server with same OS(windows server 2012 R2) I got an error during OCR process. First a window popped out saying tesserract.exe has stopped working. Then I checked the logs and I found 1 Information log following 2 error logs related to this issue.

-Information:

Fault bucket , type 0 Event Name: APPCRASH Response: Not available Cab Id: 0

 

Problem signature: P1: tesseract.exe P2: 0.0.0.0 P3: 639a0c83 P4: libtesseract-5.dll P5: 0.0.0.0 P6: 639a0c7f P7: c000001d P8: 001b639f P9:  P10:

 

Attached files:

 

These files may be available here: C:\Users\usr\AppData\Local\Microsoft\Windows\WER\ReportArchive\AppCrash_tesseract.exe_45e0684ae65db46f6f2bc3f433eb0f313d8116_60142e40_19e538bc

 

Analysis symbol:  Rechecking for solution: 0 Report Id: 2928099

2-e2a6-11ed-812b-005056a87ce5 Report Status: 2048 Hashed bucket:

-Error 1:

Faulting application name: tesseract.exe, version: 0.0.0.0, time stamp: 0x639a0c83

Faulting module name: libtesseract-5.dll, version: 0.0.0.0, time stamp: 0x639a0c7f

Exception code: 0xc000001d

Fault offset: 0x001b639f

Faulting process id: 0x17f8

Faulting application start time: 0x01d976b3f22cf105

Faulting application path: C:\Program Files (x86)\Tesseract-OCR\tesseract.exe

Faulting module path: C:\Program Files (x86)\Tesseract-OCR\libtesseract-5.dll

Report Id: 3012b6a3-e2a7-11ed-812b-005056a87ce5

Faulting package full name: 

Faulting package-relative application ID:

Error 2

Windows cannot access the file  for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program tesseract.exe because of this error.

 

Program: tesseract.exe File:

 

The error value is listed in the Additional Data section. User Action

  1. Open the file again. This situation might be a temporary problem that corrects itself when the program runs again.
  2. If the file still cannot be accessed and     - It is on the network, your network administrator should verify that there is not a problem with the network and that the server can be contacted.     - It is on a removable disk, for example, a floppy disk or CD-ROM, verify that the disk is fully inserted into the computer.
  3. Check and repair the file system by running CHKDSK. To run CHKDSK, click Start, click Run, type CMD, and then click OK. At the command prompt, type CHKDSK /F, and then press ENTER.
  4. If the problem persists, restore the file from a backup copy.
  5. Determine whether other files on the same disk can be opened. If not, the disk might be damaged. If it is a hard disk, contact your administrator or computer hardware vendor for further assistance.

 

Additional Data Error value: 00000000 Disk type: 0

There is no faulty-disk related issue. I suspected that it might be related to permissions so I ran my OCR project as admin but still same issue.

There is no internet connection in this environment but I tried the project on an environment that has no internet connection to make sure the packages doesn't require a connection.

Edit: I have set tesseract.exe to require run by Admin then tried my command as admin still same issue.

Edit-2: I have checked the possible reasons my compiled python script exe is 64-bit and its using 32-bit tesseract I suspected if this might be the issue but my tests were done like this before successfully.

Edit-3: I have also checked antivirus or firewall logs to make sure none of them blocked the dll but there were no logs regarding this.

Edit-4: I deployed an update and changed the directiory of Tesserract-OCR and tried again it still fails, I tried to run "tesseract images/eurotext.png - -l eng" in the D://Tesserract-OCR/ and it actually worked but when i run the app i assume following line fails "pytesseract.image_to_string(thresh, lang=self.lang_tesseract)"


Solution

  • After long investigations we have figured out that a security application was blocking one of the dlls of tesseract. When running tesseract.exe there was no issue but appareantly when we use it with pytesseract this dll was necessary so thats where the issue appeared.