Search code examples
androidopencvocrtesseract

Text cleaner in OpenCV like ImageMagicK script


I try to make text in image cleaner and clearer before run OCR with tesseract. In this link, they provided a good scripts to do it by ImageMagicK. I wonder is possible to convert this script and function into OpenCV code? For example, the script with arguments like this:

-g -e none -f 15 -o 20

From the explanation:

-g ...................... convert document to grayscale before enhancing
-e .... enhance ......... enhance image brightness before cleaning;
                       choices are: none, stretch or normalize; 
                       default=none
-f .... filtersize ...... size of filter used to clean background;
                       integer>0; default=15
-o .... offset .......... offset of filter in percent used to reduce noise;
                      integer>=0; default=5

How can I do the same in OpenCV code? As I am a newbie in OpenCV, I just only know how to convert to grayscale. Any help would be appreciated.


Solution

  • You have to check ImageMagick documentation to find the exact algorithms used but here is a rough guess:

    -g ...................... convert document to grayscale before enhancing
    

    That would be either cv::cvtColor with BGR2GRAY conversion or even better, load directly your image in grayscale with cv::imread(filename,CV_LOAD_IMAGE_GRAYSCALE)

    -e .... enhance ......... enhance image brightness before cleaning;
                           choices are: none, stretch or normalize; 
                           default=none
    

    Since you choosed "none", that would be nothing. Otherwise, use cv::equalizeHist (tutorial).

    -f .... filtersize ...... size of filter used to clean background;
                           integer>0; default=15
    -o .... offset .......... offset of filter in percent used to reduce noise;
                          integer>=0; default=5
    

    My guess for the two latter parameters is cv::adaptiveThreshold with -f corresponding the the blockSize param in OpenCV and -o to the constant C. The actual adaptive thresholding methode (gaussian or mean) is what you need to check in ImageMagick documentation