Search code examples
androidimage-processingocrtesseracttess-two

Why does tess-two show different result than tesseract for windows (by UB Mannheim) for the same image?


I am using Tess-Two for creating an OCR for Android. I used the same image for conversion, but the result is very different from the tesseract for desktop.

The desktop version of tesseract gives a better result.

I am using the following lines on Android:

  val baseApi = TessBaseAPI()
  baseApi.init(dirPath, "eng")
  baseApi.setImage(mustOpen)
  val recognizedText = baseApi.utF8Text

And on desktop, I am using just this simple command

tesseract image.png result

The sample image is:

this

The output for the image using tesseract for Desktop is:

VEGETABLE OF, RIVET een Sra) SUGAR, EDIBLE

VEGETABLE OIL, INVERT SUGAR S' SUGAR, CITRIC
RAISING 503 (ii), BAKING }, SALT,
SOLIDS (0.6 % [ DL-ACETYL TARTARIC

ACID ESTERS OF ‘AND

And, the output using tess-two for android is this:

'm mm W7 ' ' iii-E:
mmmmfiwgmb Ian»: came
a” ( om | mmmfiéu
mmormuguomws _

Won mm .. . . ml
mumm I'm‘n
( .

Which is very gibberish. Please help.


Solution

  • So as I commented on your post and just solved it for me, I thought I share.

    The first problem for me was that the image needs to be preprocessed for better results. I'm using OpenCV for the preprocessing. Here https://android.jlelse.eu/a-beginners-guide-to-setting-up-opencv-android-library-on-android-studio-19794e220f3c is a good example how to set it up.

    Then the image needs to be switched into a binary image. For me the following gives best results

    Mat plateMat = Utils.loadResource(this,R.drawable.plate);
    Mat gray = new Mat();
    Imgproc.cvtColor(plateMat,gray,Imgproc.COLOR_BGR2GRAY);
    Mat blur = new Mat();
    Imgproc.GaussianBlur(gray,blur,new Size(3,3),0);
    Mat thresh = new Mat();
    Imgproc.adaptiveThreshold(blur,thresh,255, Imgproc.ADAPTIVE_THRESH_MEAN_C,Imgproc.THRESH_BINARY_INV,75,10);
    Core.bitwise_not(thresh,thresh);
    Bitmap bmp = Bitmap.createBitmap(thresh.width(),thresh.height(),Bitmap.Config.ARGB_8888);
    Utils.matToBitmap(thresh,bmp);
    

    Then I call Tesseract using the eng+osd language (in this order) you can find them here: https://github.com/tesseract-ocr/tessdata

    Then by using tesseract I do this:

    TessBaseAPI tesseract = new TessBaseAPI();
    tesseract.setDebug(true);
    tesseract.init(getFilesDir().getAbsolutePath(),"eng+osd");
    tesseract.setImage(bmp);
    String utf8 = tesseract.getUTF8Text();
    

    NOW THE REAL DEAL

    The real problem why I got a different result in the end is simply because the tesseract version installed with Homebrew on my Mac was 4.1.0 meanwhile the official Tess-two repo still uses 3.05 By digging through the repos issues I found that the developer of Tess two has a new version with Tesseract 4 but it needed to be in a different repo. It is here https://github.com/adaptech-cz/Tesseract4Android

    Once I cloned it and used the extracted aar from the project, the results were the same and I can finally sleep in peace!