Search code examples
pythonseleniumurlliburllib3python-imageio

Using python and selenium to download an image using the image's "src" attribute


I'm new to Python and Selenium. My goal here is to download an image from the Google Image Search results page and save it as a file in a local directory, but I have been unable to initially download the image.

I'm aware there are other options (retrieving the image via the url using request, etc.), but I want to know if it's possible to do it using the image's "src" attribute, e.g., "..."

My code is below (I have removed all imports, etc., for brevity.):

# This creates the folder to store the image in
if not os.path.exists(save_folder):
    os.mkdir(save_folder)

driver = webdriver.Chrome(PATH)

# Goes to the given web page
driver.get("https://www.google.com/imghp?hl=en&ogbl")

# "q" is the name of the google search field input
search_bar = driver.find_element_by_name("q")

# Input the search term(s)
search_bar.send_keys("Ben Folds Songs for Silverman Album Cover")

# Returns the results (basically clicks "search")
search_bar.send_keys(Keys.RETURN)

# Wait 10 seconds for the images to load on the page before moving on to the next part of the script
try:
    # Returns a list
    search_results = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "islrg"))
    )
    # print(search_results.text)

    # Gets all of the images on the page (it should be a list)
    images = search_results.find_elements_by_tag_name("img")

    # I just want the first result
    image = images[0].get_attribute('src')

    ### Need help here ###

except:
    print("Error")
    driver.quit()

# Closes the browser
driver.quit()

I have tried:

urllib.request.urlretrieve(image, "00001.jpg")

and

urllib3.request.urlretrieve(image, f"{save_folder}/captcha.png")

But I've always hit the "except" block using those methods. After reading a promising post, I also tried:

bufferedImage = imageio.read(image)
outputFile = f"{save_folder}/image.png"
imageio.write(bufferedImage, "png", outputFile)

with similar results, though I believe the latter example used Java in the post and I may have made an error in translating it to Python.

I'm sure it's something obvious, but what am I doing wrong? Thank you for any help.


Solution

  • The URL you are dealing with in this case is a Data URL which is the data of the image itself encoded in base64.

    Since Python 3.4+ you can read this data and decode it to bytes with urllib.request.urlopen:

    import urllib
    
    data_url = "..."
    
    with urllib.request.urlopen(data_url) as response:
        data = response.read()
        with open("some_image.jpg", mode="wb") as f:
            f.write(data)
    

    Alternatively you can decode the base64-encoded part of the data url yourself with base64:

    import base64
    
    data_url = "..."
    base64_image_data = data_url.split(",")[1]
    data = base64.b64decode(base64_image_data)
    
    with open("some_image.jpg", mode="wb") as f:
        f.write(data)