I try to scrap some images from Google but this scroll down expansion of the site limits me to download only a certain amount of image. Is there any way to mimic that by a python code? For instance Machanize might be in use in such case if possible.
So I need to simulate scroll down expansion of Google Image search to increase the number of returned results and get image urls to scrap out.
This probably will get you banned pretty quickly, but I'm not sure. This requires BeautifulSoup and requests.
import requests
from bs4 import BeautifulSoup
s = requests.session()
s.headers.update({"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"})
URL = "https://www.google.dk/search"
images = []
def get_images(query, start):
screen_width = 1920
screen_height = 1080
params = {
"q": query,
"sa": "X",
"biw": screen_width,
"bih": screen_height,
"tbm": "isch",
"ijn": start/100,
"start": start,
#"ei": "" - This seems like a unique ID, you might want to use it to avoid getting banned. But you probably still are.
}
request = s.get(URL, params=params)
bs = BeautifulSoup(request.text)
for img in bs.findAll("div", {"class": "rg_di"}):
images.append(img.find("img").attrs['data-src'])
#Will get 400 images.
for x in range(0, 5):
get_images("cats", x*100)
for x in images:
print x