Search code examples
macosocrtesseract

Tesseract : different behaviours between version cf0b378 and version 3.05.01


I recently changed my computer from a PC running Ubuntu 16.04 to a MacBook Pro with Mac Os X 10.12.6. I'm working on a program using tesseract (pytesseract 0.1.7) and opencv 3.3.0 for automatic text extraction on Id cards. The problem that i'm facing right now is that my program doesn't work properly, the OCR is completely false on my MacBook and i don't get why. I'd like to know what i should do to make it work on MacBook Pro the same way it works on Ubuntu

configuration:

  • Ubuntu 16.04: tesseract was build from source

    $ tesseract --version
    tesseract cf0b378
    leptonica -1.74.1
      libjpeg 8d (libjpeg-turbo 1.4.2): libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8
    
  • MacBook os X 10.12.6 : tesseract installed via Homebrew

    $ tesseract --version:
    tesseract 3.05.01
    leptonica-1.74.4
      libjpeg 9b : libpng 1.6.32 : libtiff 4.0.8 : zlib 1.2.8
    

Example If i try this image: image.jpg

By running this command tesseract image.jpg stdout

with tesseract cf0b378 i get : Gabo / M

with tesseract 3.05.01 i get : GM"


Solution

  • I solved this by building tesseract with --HEAD option.

    brew update
    brew install tesseract --HEAD
    

    Now i have tesseract 4.00.00alpha and works perfectly fine.

    Also, i just found this answer here : https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/rdaG14IDVu8/RtihYxlOAQAJ