I'm scraping a web page that is protected by showing the information as an image so as not to be scraped. when I import an image with the format .gif
it sends an error. I have tried to find a way to convert that imported image to another format like png, but I have not been successful.
from PIL import Image
import pytesseract
import cv2
import imutils
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
url = "url.gif"
rawimge = Image.open(urllib.request.urlopen(url))
image = imutils.resize(rawimge, width=400)
blur = cv2.GaussianBlur(image, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh
phone = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(phone)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
Error:
AttributeError: 'GifImageFile' object has no attribute 'shape'
Finally I could not convert the image container to another format, so I chose to convert the image to a file and work on it.
img = Image.open(urllib.request.urlopen(url))
img.save("image.png")
image = cv2.imread("image.png",0)