Search code examples
pythonjsonhttpresponsegoogle-search-api

Convert google search results into json in python 3.1


I am writing a Python program that feeds a search term to google using the google search API and downloads the first 10 results. I was able to do this in Python 2.6 as follows:

query = urllib.parse.urlencode({'q' : 'searchterm','start' : k},doseq=false)
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
             % (query)
results = urllib.urlopen(url)
resultsjson = json.loads(results.read())
betterResults += resultsjson["responseData"]["results"]

Google's search API returns the results as a json, so I used the above code to download the results into a json of my and parse them into a list (betterResults).

When I switched over to Python 3, my program began throwing exceptions. Apparently, in Python 2.6 the object returned by urlopen() is a file-like object that can be loaded into a json. In Python 3.1, the object returned is an HTTPResponse object, which does contain a read() method, as required by the json specifications, but is a byte object. I was therefore unable to access the information as I had in 2.6.

Is there any way to access the json returned by google? How can I get the results in Python 3 and be able to select which fields I want, as I was able to do with the json?

Thank you very much, bsg


Solution

  • The object returned by urlopen is file like, you are wrong there. But you use json.loads(), which expects a string. json.load() expects a file like object.

    However, json.load() expects the result of the read() method to be a string, while of course the read you get will be bytes, so you need to decode it from bytes to a string first.

    So, something like this:

    query = urllib.parse.urlencode({'q' : 'searchterm','start' : k},doseq=false)
    url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
                 % (query)
    results = urllib.urlopen(url)
    encoding = input.getheader('content-type').split('=')[-1]
    resultsjson = json.loads(results.read().decode(encoding))
    betterResults += resultsjson["responseData"]["results"]
    

    Might work. (I didn't test it).