I have 107 images and I want to extract text from them, and I am using Gemini API, and this is my code till now:
# Gemini Model
model = genai.GenerativeModel('gemini-pro-vision', safety_settings=safety_settings)
# Code
images_to_process = [os.path.join(image_dir, image_name) for image_name in os.listdir(image_dir)] # list of 107 images
prompt = """Carefully scan this images: if it has text, extract all the text and return the text from it. If the image does not have text return '<000>'."""
for image_path in tqdm(images_to_process):
img = Image.open(image_path)
output = model.generate_content([prompt, img])
text = output.text
print(text)
In this code, I am just taking one image at a time and extracting text from it using Gemini.
Problem - I have 107 images and this code is taking ~10 minutes to run. I know that Gemini API can handle 60 requests per minute. How to send 60 images at the same time? How to do it in batch?
2024-10 update: I've added a Cookbook Quickstart on asynchronous requests to show how this works. The advice below is still correct.
In synchronous Python you can use something like a ThreadPoolExecutor
to make your requests in separate threads.
The Gemini Python SDK has an async API though, which can be a bit more natural:
$ python -m asyncio
>>> import asyncio
>>> import google.generativeai as genai
>>> import PIL
>>> model = genai.GenerativeModel('gemini-pro-vision')
>>> imgs = ['/path/img.jpg', ...]
>>> prompt = "..."
>>> async def process_image(img: str) -> str:
... r = await model.generate_content_async([prompt, PIL.Image.open(img)])
... # TODO: error handling
... return r.text
>>> jobs = asyncio.gather(*[process_image(img) for img in imgs])
>>> results = await jobs # or run_until_complete(jobs)
>>> results
['text is here', ...]
This uses the implicit asyncio
REPL event loop, in a real app you'll need to set up and use your own event loop.
See also TaskGroups
.