Search code examples
c++opencvocrtesseractleptonica

Different Tesseract result for Mat and Pix


Goal

Getting the same quality result when using OpenCV Mat as when using Leptonica Pix when doing OCR with Tesseract.

Environment

C++17, OpenCV 3.4.1, Tesseract 3.05.01, Leptonica 1.74.4, Visual Studio Community 2017, Windows 10 Pro 64-bit

Description

I'm working with Tesseract and OCR, and have found what I think is a peculiar behaviour.

This is my input image: Input image for the OCR

And this is my code:

#include "stdafx.h"
#include <iostream>
#include <opencv2/opencv.hpp>
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

#pragma comment(lib, "ws2_32.lib")

using namespace std;
using namespace cv;
using namespace tesseract;

void opencvVariant(string titleFile);
void leptonicaVariant(const char* titleFile);

int main()
{
    cout << "Tesseract with OpenCV and Leptonica" << endl;

    const char* titleFile = "raptor-companion-2.jpg";
    opencvVariant(titleFile);
    leptonicaVariant(titleFile);

    cout << endl;
    system("pause");
    return 0;
}

void opencvVariant(string titleFile) {

    cout << endl << "OpenCV variant..." << endl;

    TessBaseAPI ocr;
    ocr.Init(NULL, "eng");
    Mat image = imread(titleFile);
    ocr.SetImage(image.data, image.cols, image.rows, 1, image.step);

    char* outText = ocr.GetUTF8Text();
    int confidence = ocr.MeanTextConf();

    cout << "Text: " << outText << endl;
    cout << "Confidence: " << confidence << endl;
}

void leptonicaVariant(const char* titleFile) {

    cout << endl << "Leptonica variant..." << endl;

    TessBaseAPI ocr;
    ocr.Init(NULL, "eng");
    Pix *image = pixRead(titleFile);
    ocr.SetImage(image);

    char* outText = ocr.GetUTF8Text();
    int confidence = ocr.MeanTextConf();

    cout << "Text: " << outText << endl;
    cout << "Confidence: " << confidence << endl;
}

The methods opencvVariant and leptonicaVariant is basically the same except that one is using the class Mat from OpenCV and the other Pix from Leptonica. Yet, the result is quite different.

OpenCV variant...
Text: Rapton


Confidence: 68

Leptonica variant...
Text: Raptor Companion


Confidence: 83

As one can see in the output above, the Pix variant gives a much better result than the Mat variant. Since my code relies heavily on OpenCV for the computer vision before the OCR its essential for me that the OCR works well with OpenCV and its' classes.

Questions

  • Why does Pix give a better result than Mat, and vice versa?
  • How could the algorithm be changed to make the Mat variant as efficient as the Pix variant?

Solution

  • OpenCV imread function by default reads image as colored, which means you get pixels as BGRBGRBGR....
    In your example you are assuming opencv image is grayscale, so there are 2 ways of fixing that:

    1. Change your SetImage line according to number of channels in opencv image

      ocr.SetImage((uchar*)image.data, image.size().width, simageb.size().height, image.channels(), image.step1());

    2. Convert your opencv image to grayscale with 1 channel

      cv::cvtColor(image, image, CV_BGR2GRAY);