I have a piece of Python code that helps me with scraping some images from a website every morning - for a daily project I am responsible for. It all works fine and I get JPGs and PNGs with no issues. The problem is that animated GIFs most of the time get saved/downloaded as a static GIF. Sometimes it does save as animated but rarely.
Im not really familiar with BeautifulSoup, so I'm not sure if I'm doing something wrong, or there is a limitation in the way BeautifulSoup handles animated GIFs.
Im using the kickstarter url just for testing purposes...
import os
import sys
import requests
import urllib
import urllib.request
from bs4 import BeautifulSoup
from csv import writer
baseUrl = requests.get('https://www.kickstarter.com/projects/peak-design/travel-tripod-by-peak-design')
soup = BeautifulSoup(baseUrl.text, 'html.parser')
allImgs = soup.findAll('img')
imgCounter = 1
for img in allImgs:
newImg = img.get('src')
# CHECK EXTENSION
if '.jpg' in newImg:
extension = '.jpg'
elif '.png' in newImg:
extension = '.png'
elif '.gif' in newImg:
extension = '.gif'
imgFile = open(str(imgCounter) + extension, 'wb')
imgFile.write(urllib.request.urlopen(newImg).read())
imgCounter = imgCounter + 1
imgFile.close()
Any help or insight on this issue would be most appreciated!!!
-S
Here's what works for me...
Basically I need to grab the data-src
from any file that is a GIF and not the src
as I was doing for ALL images.
Here's the revised code:
import os
import sys
import requests
import urllib
import urllib.request
from bs4 import BeautifulSoup
from csv import writer
baseUrl = requests.get('https://www.kickstarter.com/projects/peak-design/travel-tripod-by-peak-design')
soup = BeautifulSoup(baseUrl.text, 'html.parser')
allImgs = soup.findAll('img')
imgCounter = 1
for img in allImgs:
newImg = img.get('data-src')
if newImg == None:
newImg = img.get('src')
#CHECK EXTENSION
if '.jpg' in newImg:
extension = '.jpg'
elif '.png' in newImg:
extension = '.png'
elif '.gif' in newImg:
extension = '.gif'
imgFile = open(str(imgCounter) + extension, 'wb')
imgFile.write(urllib.request.urlopen(newImg).read())
imgCounter = imgCounter + 1
imgFile.close()