Search code examples
pythoncaptchatext-recognition

how to get captcha numbers separately using python


I have this specific 3-digit of captcha, like:

enter image description here

I am trying to slice the 3 digits, I tried to use pytesseract module to recognize text in images but it's not so accurate. so I researched about it and fount out that I could make the background completely white so that I could crop all the extra space from the picture and dividing the picture to 3 pieces would most likely happens to be what I need, so I'm looking for a way to implement this filter and crop it and slicing it into three pieces

I found out PIL module can help me import the image on python

from PIL import Image

im = Image.open("captcha.jpg")

and I'm looking for a way which I can make the background totally white and crop the extra spaces and divide the picture into three pieces, thanks for your guidance in advance.


Solution

  • so I have found this library called cv2 with this method called threshold

    For every pixel, the same threshold value is applied. If the pixel value is smaller than the threshold, it is set to 0, otherwise it is set to a maximum value.

    img = cv.imread('gradient.png',0)
    ret,thresh1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
    

    in the example above it takes an image and if the pixel is below 127 it makes it completely white, otherwise it's going to be completely black.

    further reading:

    https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html