Search code examples
pythonweb-scrapingbeautifulsoupanimated-gif

Python / BeautifulSoup Image Scraping Does Not Save Animated GIFs Correctly


I have a piece of Python code that helps me with scraping some images from a website every morning - for a daily project I am responsible for. It all works fine and I get JPGs and PNGs with no issues. The problem is that animated GIFs most of the time get saved/downloaded as a static GIF. Sometimes it does save as animated but rarely.

Im not really familiar with BeautifulSoup, so I'm not sure if I'm doing something wrong, or there is a limitation in the way BeautifulSoup handles animated GIFs.

Im using the kickstarter url just for testing purposes...

import os
import sys
import requests
import urllib
import urllib.request
from bs4 import BeautifulSoup
from csv import writer

baseUrl = requests.get('https://www.kickstarter.com/projects/peak-design/travel-tripod-by-peak-design')
soup = BeautifulSoup(baseUrl.text, 'html.parser')

allImgs = soup.findAll('img')

imgCounter = 1

for img in allImgs:
    newImg = img.get('src')

    # CHECK EXTENSION
    if '.jpg' in newImg:
        extension = '.jpg'
    elif '.png' in newImg:
        extension = '.png'
    elif '.gif' in newImg:
        extension = '.gif'

    imgFile = open(str(imgCounter) + extension, 'wb')
    imgFile.write(urllib.request.urlopen(newImg).read())
    imgCounter = imgCounter + 1
    imgFile.close()

Any help or insight on this issue would be most appreciated!!!

-S


Solution

  • Here's what works for me... Basically I need to grab the data-src from any file that is a GIF and not the src as I was doing for ALL images.

    Here's the revised code:

    import os
    import sys
    import requests
    import urllib
    import urllib.request
    from bs4 import BeautifulSoup
    from csv import writer
    
    baseUrl = requests.get('https://www.kickstarter.com/projects/peak-design/travel-tripod-by-peak-design')
    soup = BeautifulSoup(baseUrl.text, 'html.parser')
    
    allImgs = soup.findAll('img')
    
    imgCounter = 1
    
    for img in allImgs:
        newImg = img.get('data-src')
        if newImg == None:
            newImg = img.get('src')
    
        #CHECK EXTENSION
        if '.jpg' in newImg:
            extension = '.jpg'
        elif '.png' in newImg:
            extension = '.png'
        elif '.gif' in newImg:
            extension = '.gif'
    
        imgFile = open(str(imgCounter) + extension, 'wb')
        imgFile.write(urllib.request.urlopen(newImg).read())
        imgCounter = imgCounter + 1
        imgFile.close()