Search code examples
androidocrtesseracttess-two

Tesseract ocr returns null string


I am building an OCR app for android and i use tesseract ocr engine. Somehow every time i use the engine on a photo it returns an empty text. This is my code:

public String detectText(Bitmap bitmap) {
    TessBaseAPI tessBaseAPI = new TessBaseAPI();
    String mDataDir = setTessData();
    tessBaseAPI.setDebug(true);
    tessBaseAPI.init(mDataDir + File.separator, "eng");
    tessBaseAPI.setImage(bitmap);
    tessBaseAPI.setPageSegMode(TessBaseAPI.OEM_TESSERACT_ONLY);
    String text = tessBaseAPI.getUTF8Text();

    tessBaseAPI.end();

    return text;
}

private String setTessData(){
    String mDataDir = this.getExternalFilesDir("data").getAbsolutePath();
    String mTrainedDataPath = mDataDir + File.separator + "tessdata";
    String mLang = "eng";
    // Checking if language file already exist inside data folder
    File dir = new File(mTrainedDataPath);
    if (!dir.exists()) {
        if (!dir.mkdirs()) {
            //showDialogFragment(SD_ERR_DIALOG, "sd_err_dialog");
        } else {
        }
    }

    if (!(new File(mTrainedDataPath + File.separator + mLang + ".traineddata")).exists()) {

        // If English or Hebrew, we just copy the file from assets
        if (mLang.equals("eng") || mLang.equals("heb")){
            try {
                AssetManager assetManager = context.getAssets();
                InputStream in = assetManager.open(mLang + ".traineddata");
                OutputStream out = new FileOutputStream(mTrainedDataPath + File.separator + mLang + ".traineddata");
                copyFile(in, out);
                //Toast.makeText(context, getString(R.string.selected_language) + " " + mLangArray[mLangID], Toast.LENGTH_SHORT).show();
                //Log.v(TAG, "Copied " + mLang + " traineddata");
            } catch (IOException e) {
                //showDialogFragment(SD_ERR_DIALOG, "sd_err_dialog");
            }
        }

        else{

            // Checking if Network is available
            if (!isNetworkAvailable(this)){
                //showDialogFragment(NETWORK_ERR_DIALOG, "network_err_dialog");
            }
            else {
                // Shows a dialog with File dimension. When user click on OK download starts. If he press Cancel revert to english language (like NETWORK ERROR)
                //showDialogFragment(CONTINUE_DIALOG, "continue_dialog");
            }
        }
    }
    else {
        //Toast.makeText(mThis, getString(R.string.selected_language) + " " + mLangArray[mLangID], Toast.LENGTH_SHORT).show();
    }
    return mDataDir;
}

I have debugged it many times and the bitmap is being transferred correctly to the detectText method. The language data files(tessdata) exists on the phone and the path to them is also correct.

Does anybody knows what the problem here?


Solution

  • You are using the OCR Engine Mode Enum value for setting the page segmentation in your setTessData() method.

    setTessData() {
        ...
        tessBaseAPI.setPageSegMode(TessBaseAPI.OEM_TESSERACT_ONLY);
    }
    

    Based on the type of image on which you are trying to detect the characters, setting an appropriate Page segmentation mode will help detect the characters.

    For example :

    tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
    

    The various other Page segmentation values are present in TessBaseApi.java :

    /** Page segmentation mode. */
    public static final class PageSegMode {
        /** Orientation and script detection only. */
        public static final int PSM_OSD_ONLY = 0;
    
        /** Automatic page segmentation with orientation and script detection. (OSD) */
        public static final int PSM_AUTO_OSD = 1;
    
        /** Fully automatic page segmentation, but no OSD, or OCR. */
        public static final int PSM_AUTO_ONLY = 2;
    
        /** Fully automatic page segmentation, but no OSD. */
        public static final int PSM_AUTO = 3;
    
        /** Assume a single column of text of variable sizes. */
        public static final int PSM_SINGLE_COLUMN = 4;
    
        /** Assume a single uniform block of vertically aligned text. */
        public static final int PSM_SINGLE_BLOCK_VERT_TEXT = 5;
    
        /** Assume a single uniform block of text. (Default.) */
        public static final int PSM_SINGLE_BLOCK = 6;
    
        /** Treat the image as a single text line. */
        public static final int PSM_SINGLE_LINE = 7;
    
        /** Treat the image as a single word. */
        public static final int PSM_SINGLE_WORD = 8;
    
        /** Treat the image as a single word in a circle. */
        public static final int PSM_CIRCLE_WORD = 9;
    
        /** Treat the image as a single character. */
        public static final int PSM_SINGLE_CHAR = 10;
    
        /** Find as much text as possible in no particular order. */
        public static final int PSM_SPARSE_TEXT = 11;
    
        /** Sparse text with orientation and script detection. */
        public static final int PSM_SPARSE_TEXT_OSD = 12;
    
        /** Number of enum entries. */
        public static final int PSM_COUNT = 13;
    }
    

    You can experiment with different page segmentation enum values and see which gives the best result.