I'm new to Python and Selenium. My goal here is to download an image from the Google Image Search results page and save it as a file in a local directory, but I have been unable to initially download the image.
I'm aware there are other options (retrieving the image via the url using request, etc.), but I want to know if it's possible to do it using the image's "src" attribute, e.g., "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxM..."
My code is below (I have removed all imports, etc., for brevity.):
# This creates the folder to store the image in
if not os.path.exists(save_folder):
os.mkdir(save_folder)
driver = webdriver.Chrome(PATH)
# Goes to the given web page
driver.get("https://www.google.com/imghp?hl=en&ogbl")
# "q" is the name of the google search field input
search_bar = driver.find_element_by_name("q")
# Input the search term(s)
search_bar.send_keys("Ben Folds Songs for Silverman Album Cover")
# Returns the results (basically clicks "search")
search_bar.send_keys(Keys.RETURN)
# Wait 10 seconds for the images to load on the page before moving on to the next part of the script
try:
# Returns a list
search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "islrg"))
)
# print(search_results.text)
# Gets all of the images on the page (it should be a list)
images = search_results.find_elements_by_tag_name("img")
# I just want the first result
image = images[0].get_attribute('src')
### Need help here ###
except:
print("Error")
driver.quit()
# Closes the browser
driver.quit()
I have tried:
urllib.request.urlretrieve(image, "00001.jpg")
and
urllib3.request.urlretrieve(image, f"{save_folder}/captcha.png")
But I've always hit the "except" block using those methods. After reading a promising post, I also tried:
bufferedImage = imageio.read(image)
outputFile = f"{save_folder}/image.png"
imageio.write(bufferedImage, "png", outputFile)
with similar results, though I believe the latter example used Java in the post and I may have made an error in translating it to Python.
I'm sure it's something obvious, but what am I doing wrong? Thank you for any help.
The URL you are dealing with in this case is a Data URL which is the data of the image itself encoded in base64.
Since Python 3.4+ you can read this data and decode it to bytes with urllib.request.urlopen
:
import urllib
data_url = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxM..."
with urllib.request.urlopen(data_url) as response:
data = response.read()
with open("some_image.jpg", mode="wb") as f:
f.write(data)
Alternatively you can decode the base64-encoded part of the data url yourself with base64
:
import base64
data_url = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxM..."
base64_image_data = data_url.split(",")[1]
data = base64.b64decode(base64_image_data)
with open("some_image.jpg", mode="wb") as f:
f.write(data)