Search code examples
javac++java-native-interfacetesseracttess-two

bytes per pixel, bytes per line - How to use function nativeSetImageBytes in tessbaseapi.cpp of tess-two?


we are parsing an image showing a textsnippet which has a resolution of 2121x105 px. In Java we have the following code to get an byte array (one of our constraints is to work with a byte array here):

import org.apache.commons.io.IOUtils;

...

InputStream is = getAssets().open("images/text.png");
byte[] bytes = IOUtils.toByteArray(is);

This byte array is then passed to the native C++ code - we are not using the Java wrapper of tess-two, we use the native libraries though. In the native code we are trying to get the text of the image with GetUTF8Text(). Then we saw that tess-two has already an implementation for setting the image to read from by passing it as a byte array:

void Java_com_..._TessBaseAPI_nativeSetImageBytes(JNIEnv *env,
                                                  jobject thiz,
                                                  jlong mNativeData,
                                                  jbyteArray data,
                                                  jint width,
                                                  jint height,
                                                  jint bpp,
                                                  jint bpl) {

...

We figured that bpp for a PNG should be 4 (RGBA). It's not clear though what is is expected for bpl. If we set the width of the image muliplied by bpp then we get a segmentation error. If we set it to zero an empty string is returned.

UPDATE: The semgentation error is thrown in GetUTF8Text() and not in SetImage().

SIGSEGV (signal SIGSEGV: invalid address (fault address: 0xc))

Solution

  • tess-two which uses tesseract OCR expects decoded image in rgba rgb or gray format.

    So you need to decode your png (this question explains how to do it in java) and convert result to byte array.

    bpp is bytes per pixel for rgba format it would be 4 (1 byte is red 2 is green 3 is blue 4 is alpha) for rgb it would be 3 (1 byte is red 2 is green 3 is blue) for grayscale it would be 1.

    bpl is bytes per line = bpp * image width