Spend half a day trying to find the best way to pre-process image for Tesseract OCR and did not find any good results besides thresholding. Can anybody suggest what kind of steps I should try? OpenCV, ImageMagick, Gimp is fine for me as tools, Images can have different backgrounds but the font and color of the font will be always the same. Here are the image samples:
I`ve got something like that currently using threshold filters:
and text from OCR like that: "ELIMINATED LIFELINES220_{¢-\"| “, Vv a . —"
I`ve found a good article that describes a lot of pre-processing steps https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
But the best one was to use "Top-hat morphological operation" - manupulations using neighborhood pixels. That can be done using OpenCV
tophat = cv2.morphologyEx(gray, cv2.MORPH_TOPHAT, rectKernel)
or can be done using ImageMagick http://www.imagemagick.org/Usage/morphology/#top-hat