Search code examples
pythontesseractpython-tesseract

Symbol lookup error while using Tesseract


I've been using Tesseract 4, for a project for more than two months now. (This means that it's running on input images for more than two months.) The problem that I'm shown is:

multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/cse/.local/lib/python3.5/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "UKExtraction2.py", line 267, in tessBox
    op = pt.image_to_string(box[0],lang='hin+eng',config='--psm 6')
  File "/home/cse/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 286, in image_to_string
    return run_and_get_output(image, 'txt', lang, config, nice)
  File "/home/cse/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 194, in run_and_get_output
    run_tesseract(**kwargs)
  File "/home/cse/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 170, in run_tesseract
    raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: tesseract: undefined symbol: _ZN9tesseract15TessPDFRendererC1EPKcS2_b')
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "UKExtraction2.py", line 855, in <module>
    doItAllUpper("A0","UK4.csv","temp",27,70,"box",2,1000,firstPageCoordsUK,boxCoordUK,voterBoxCoordUK,internalBoxNumberCoordUK,externalBoxNumberCoordUK,addListInfoUK)
  File "UKExtraction2.py", line 776, in doItAllUpper
    doItAll(tempPDFName,outputCSV,2,pdfs,formatType,n_blocks,writeBlockSize,firstPageCoords,boxCoord,voterBoxCoord,internalBoxNumberCoord,externalBoxNumberCoord,addListInfo,pdfName)           
  File "UKExtraction2.py", line 617, in doItAll
    mainProcess(pdfName,(0,noOfPages-1),formatType,n_blocks,outputCSV,writeBlockSize,firstPageCoords,boxCoord,voterBoxCoord,internalBoxNumberCoord,externalBoxNumberCoord,addListInfo,bigPDFName,basePages)
  File "UKExtraction2.py", line 563, in mainProcess
    names_lst = cropAndOCR(im,(tup[0],tup[1]),formatType,boxCoord,voterBoxCoord,externalBoxNumberCoord,n_blocks,basePages)# Add the values of fpageInfo
  File "UKExtraction2.py", line 416, in cropAndOCR
    results = pool.map(tessBox,box_lst_divided)
  File "/home/cse/.local/lib/python3.5/site-packages/pathos/multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 644, in get
    raise self._value
pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: tesseract: undefined symbol: _ZN9tesseract15TessPDFRendererC1EPKcS2_b')

The pathos part is because of the fact that the project uses two threads to work. The important part is:

pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: tesseract: undefined symbol: _ZN9tesseract15TessPDFRendererC1EPKcS2_b')

A user posted for this error on the tesseract-ocr google mailing group:

combine_tessdata: symbol lookup error: combine_tessdata: undefined symbol: _Z7tprintfPKcz

And got the answer that

"undefined symbol" indicate a broken installation

But as I said, this version is running without any errors for more than two months, so there shouldn't be any problems with the tesseract installation.

Another user posted the same problem at the group, but no one replied.

So, I assumed that the problem can be at two places:

  1. In the image provided to tesseract.
  2. Inside tesseract.

The image might not be an image altogether! That is, it might have 0x0 dimensions (though that isn't possible given the construction process of the image). But that is not possible, because the error I got was:

SystemError: tile cannot extend outside image

When I tried my hypothesis.

This means, that the image was present, so tesseract should have worked.

This also means that the problem is inside Tesseract. I'm no expert at tesseract's inner workings, but given the fact that this version worked until now correctly and there is no problem with the input image, what could be the problem with Tesseract?

P.S: I'm currently not near the system that runs the script, but I do know of the error that occurred. I might not be able to give exact details about the system, therefore I expect hypothesis for the problem.

P.S: The script is here.


Solution

  • Here is the solution for ubuntu 18.04

    Please first install the libraries which are required for tesseract-ocr

    sudo apt install libtesseract-dev libleptonica-dev liblept5
    

    Then simply install tesseract using command

    sudo apt install tesseract-ocr -y