I am a beginner in programming, and this is my first little try. I'm currently facing a bottleneck, I would like to ask for the help. Any advice will be welcome. Thank you in advance!
Here is what I want to do:
To make a text detection application and extract the text for the further usage(for instance, to map some of the other relevant information in a data). So, I devided into two steps: 1.first, to detect the text 2.extract the text and use the regular expression to rearrange it for the data mapping.
For the first step, I use google vision api, so I have no probelm reading the image from google cloud storage(code reference 1):
However, when it comes to step two, I need a PIL module to open the file for drawing the text. When useing the methodImage.open()
, it requries a path`. My question is how do I call the path? (code reference 2):
code reference 1:
from google.cloud import vision
image_uri = 'gs://img_platecapture/img_001.jpg'
client = vision.ImageAnnotatorClient()
image = vision.Image()
image.source.image_uri = image_uri ## <- THE PATH ##
response = client.text_detection(image=image)
for text in response.text_annotations:
print('=' * 30)
print(text.description)
vertices = ['(%s,%s)' % (v.x, v.y) for v in text.bounding_poly.vertices]
print('bounds:', ",".join(vertices))
if response.error.message:
raise Exception(
'{}\nFor more info on error messages, check: '
'https://cloud.google.com/apis/design/errors'.format(
response.error.message))
code reference 2:
from PIL import Image, ImageDraw
from PIL import ImageFont
import re
img = Image.open(?) <- THE PATH ##
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("simsun.ttc", 18)
for text in response.text_annotations[1::]:
ocr = text.description
bound=text.bounding_poly
draw.text((bound.vertices[0].x-25, bound.vertices[0].y-25),ocr,fill=(255,0,0),font=font)
draw.polygon(
[
bound.vertices[0].x,
bound.vertices[0].y,
bound.vertices[1].x,
bound.vertices[1].y,
bound.vertices[2].x,
bound.vertices[2].y,
bound.vertices[3].x,
bound.vertices[3].y,
],
None,
'yellow',
)
texts=response.text_annotations
a=str(texts[0].description.split())
b=re.sub(u"([^\u4e00-\u9fa5\u0030-u0039])","",a)
b1="".join(b)
regex1 = re.search(r"\D{1,2}Dist.",b)
if regex1:
regex1="{}".format(regex1.group(0))
.........
PIL
does not have built in ability to automatically open files from GCS. you will need to either
Download the file to local storage and point PIL to that file or
Give PIL a BlobReader
which it can use to access the data:
from PIL import Image
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.bucket('img_platecapture')
blob = bucket.get_blob('img_001.jpg') # use get_blob to fix generation number, so we don't get corruption if blob is overwritten while we read it.
with blob.open() as file:
img = Image.open(file)
# ...