Search code examples
c++tesseractleptonica

Leptonica failing to deskew 45 and 135 degree rotated text


I'm having an issue working with Leptonica version 1.80.0 on a Red Hat 9 machine. I'm using the function pixDeskewGeneral(...) to deskew some images before being processed by Tesseract for OCR text extraction.

This function works great with images rotated 5º, 10º, 15º..., and 40º clockwise; the angle is detected correctly, and the image is deskewed.

But when text is rotated 45º or 135º, pixDeskewGeneral(...) does not detect the correct angle, and the image is not deskewed.

Here you have an example that reproduces the issue:

/// \file deskew.cpp
/// Image deskewing utility
///
/// Build it by running:
///     g++ -o deskew deskew.cpp -llept

#include <iostream>
#include <cstring>
#include <leptonica/allheaders.h>

/// Get image format based on file extension
l_int32 getFormatFromExtension(const char* filename) {
    const char* extension = strrchr(filename, '.');
    if (extension) {
        if (strcmp(extension, ".png") == 0)
            return IFF_PNG;
        else if (strcmp(extension, ".jpg") == 0 || strcmp(extension, ".jpeg") == 0)
            return IFF_JFIF_JPEG;
        // Add support for other image formats as needed
    }
    // Default to PNG format if extension not recognized
    return IFF_PNG;
}

/// Main entry point
int main(int argc, char* argv[]) {
    if (argc != 3) {
        std::cerr << "Usage: " << argv[0] << " <input_file> <output_file>\n";
        return 1;
    }

    const char* inputFileName = argv[1];
    const char* outputFileName = argv[2];

    // Read input image
    Pix *image = pixRead(inputFileName);
    if (!image) {
        std::cerr << "Error: Could not open input file: " << inputFileName << "\n";
        return 1;
    }

    // Deskewing parameters
    const int     redsweep = 0;
    const float   sweeprange = 50;
    const float   sweepdelta = 0.0;
    const int     redsearch = 0;
    const int     thresh = 0;
    float         angle;
    float         conf;

    // Deskew the image
    Pix *deskewed = pixDeskewGeneral(image, redsweep, sweeprange, sweepdelta, redsearch, thresh, &angle, &conf);
    pixDestroy(&image); // Release the original image

    if (!deskewed) {
        std::cerr << "Error: Could not deskew the image\n";
        return 1;
    }

    // Determine output image format based on file extension
    l_int32 outputFormat = getFormatFromExtension(outputFileName);
    if (outputFormat == IFF_UNKNOWN) {
        std::cerr << "Error: Unsupported output file format\n";
        pixDestroy(&deskewed); // Release the deskewed image
        return 1;
    }

    // Save the deskewed image
    if (pixWrite(outputFileName, deskewed, outputFormat) != 0) {
        std::cerr << "Error: Could not save the output file: " << outputFileName << "\n";
        pixDestroy(&deskewed); // Release the deskewed image
        return 1;
    }

    std::cout << "Leptonica version: " << getLeptonicaVersion() << "\n"
              << "Deskewing complete, angle = " << angle << "º, confidence = " << conf << ".\n"
              << "n Output saved as: " << outputFileName << "\n";


    // Release the deskewed image
    pixDestroy(&deskewed);

    return 0;
}

I'm using sample images of plain black text over a white background on my tests. When the image is rotated 40º (see here), this is the output from the previous code:

./deskew sample40.jpg img40.jpg
Leptonica version: leptonica-1.80.0
Deskewing complete, angle = -39.9844º, confidence = 11.0401.
Output saved as: img40.jpg

The skew angle is detected correctly, the confidence is high, and the output is a deskewed image.

But when the input image is rotated 45º (see here), this is the output:

./deskew sample45.jpg img45.jpg
Leptonica version: leptonica-1.80.0
Deskewing complete, angle = 12º, confidence = 1.20339.
Output saved as: img45.jpg

The skew angle is NOT correct (12º); the output is just the input (still skewed) image.

I read the function documentation and the source code. This behaviour is expected when the skew angle cannot be detected, but why does it succeed with a 40º rotated text and fail with the same text rotated by 45º?

Do you have any suggestions to handle these angles?


Solution

  • I posted the question at the official github repository for Leptonica, and Dan Bloomberg provided a clear and concise explanation. Basically, vertical shears are performed instead of rotations because of their efficiency, and relevant parts of the image may be cropped when the skew angle is big and the image border is not wide enough.

    In my case, I was able to solve the problem by using pixClipToForeground() before trying to deskew the image. This removed the image border and ensured the text in the image wasn't lost in pixDeskewGeneral().

    However, I should note that this is not a universal solution that will work for all images. It only worked for my sample images because the border was straightforward to detect.