I'm trying to scrape some data from a website using Python and Beautiful Soup, specifically an image in base64 format. However, when I run my code, the image data appears in a strange format like this:
"image": "data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7",
Here's the relevant code snippet:
def search_mercadolivre_by_category(category):
url = f"https://lista.mercadolivre.com.br/{category}"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
products = soup.find_all("li", {"class": "ui-search-layout__item"})
results = []
for product in products:
title = product.find("h2", {"class": "ui-search-item__title"}).text.strip()
price = product.find("span", {"class": "price-tag-fraction"}).text.strip()
link = product.find("a", {"class": "ui-search-link"})['href']
image = product.find("img")['src']
results.append({
"title": title,
"price": price,
"link": link,
"image": image,
"category": category,
"website": "Mercado Livre",
"keyword": ""
})
return results
Can anyone help me decode the image data properly?
I was expecting to find this source here.
<img width="160" height="160" decoding="async" src="https://http2.mlstatic.com/D_NQ_NP_609104-MLA50695427900_072022-V.webp" class="ui-search-result-image__element shops__image-element" alt="Samsung Galaxy M13 Dual SIM 128 GB verde 4 GB RAM">
That's a DataURI. You can most simply read it like this:
from urllib import request
with request.urlopen('data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7') as DataURI:
im = DataURI.read()
If you look at the first few bytes, you can see it is indeed a 1x1 GIF image:
print(im[:10]) # prints b'GIF89a\x01\x00\x01\x00'
If you want to save it to disk as image.gif
, you can use:
from pathlib import Path
Path('image.gif').write_bytes(im)
If you want to open it in PIL, you can wrap it in a BytesIO
and open it like this:
from PIL import Image
from io import BytesIO
# Open as PIL Image
PILImage = Image.open(BytesIO(im))
PILImage.show() # display in viewer
PILImage.save('result.png') # save to disk as PNG