I'm trying do obtain images from Google Image search for a specific query. But the page I download is without pictures and it redirects me to Google's original one. Here's my code:
AGENT_ID = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
GOOGLE_URL = "https://www.google.com/images?source=hp&q={0}"
_myGooglePage = ""
def scrape(self, theQuery) :
self._myGooglePage = subprocess.check_output(["curl", "-L", "-A", self.AGENT_ID, self.GOOGLE_URL.format(urllib.quote(theQuery))], stderr=subprocess.STDOUT)
print self.GOOGLE_URL.format(urllib.quote(theQuery))
print self._myGooglePage
f = open('./../../googleimages.html', 'w')
f.write(self._myGooglePage)
What am I doing wrong?
Thanks
I'll give you a hint ... start here:
https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=JULIE%20NEWMAR
Where JULIE and NEWMAR are your search terms.
That will return the json data you need ... you'll need to parse that using json.load or simplejson.load to get back a dict ... followed by diving into it to find first the responseData, then the results list which contains the individual items whose url you will then want to download.
Though I don't suggest in any way doing automated scraping of Google, since their (deprecated) API for this specifically says not to.