Search code examples
pythonweb-scrapingbeautifulsoupgoogle-image-search

Scrape src attribute from google with beautiful soup only


I'm trying to scrape google images. While beautiful soup extracts 'src' it outputs links

data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw== 

which is not the actual image. The script tag looks heavily encoded and doesn't contain the actual URI. Can anybody suggest me a solution?

Actually this is minified data URI which when decoded yields a 1x1 image. My question is how google minifies complete data URI and how can we access the full URI so that we can get the actual image?


Solution

  • That's the image in Base64 encoding. You can save it to a image file like:

    src = "BASE64 DATA"
    img = open("MyImage.gif","wb+")
    img.write(src.decode('base64'))
    img.close()