Search code examples
androidimage-processingocrtesseracttess-two

Types of filters used by Tesseract


I'm working on testing how much can I improve results of Tesseract OCR recognition by different filters used on image preprocessing. But to do so accordingly, I need to know what type of filtration Tesseract uses by itself. By seeing results most probably only type of filtering is converting to grayscale and then applying binary threshold.

Does anyone know what types of filters are used or where can I find this kind of info?


Solution

  • Tesseract v3 uses Otsu thresholding if I'm not mistaken.

    You can use the getThresholdedImage() method to see the result of this.

    And as you saw, the link posted by @Piglet above may be helpful: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality