Search code examples
typeerrorgoogle-search-apinonetype

Nonetype Error on Python Google Search Script - Is this a spam prevention tactic?


Fairly new to Python so apologies if this is a simple ask. I have browsed other answered questions but can't seem to get it functioning consistently.

I found the below script which prints the top result from google for a set of defined terms. It will work the first few times that I run it but will display the following error when I have searched 20 or so terms:

Traceback (most recent call last):
  File "term2url.py", line 28, in <module>
    results = json['responseData']['results']
TypeError: 'NoneType' object has no attribute '__getitem__'

From what I can gather, this indicates that one of the attributes does not have a defined value (potentially a result of google blocking me?). I attempted to solve the issue by adding in the else clause though I still run into the same problem.

Any help would be greatly appreciated; I have pasted the full code below.

Thanks!

#
# This is a quick and dirty script to pull the most likely url and description
# for a list of terms.  Here's how you use it:
#
# python term2url.py < {a txt file with a list of terms} > {a tab delimited file of results}
#
# You'll must install the simpljson module to use it 
#
import urllib
import urllib2
import simplejson
import sys

# Read the terms we want to convert into URL from info redirected from the command line
terms = sys.stdin.readlines()

for term in terms:

   # Define the query to pass to Google Search API
   query = urllib.urlencode({'q' : term.rstrip("\n")})
   url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s" % (query)

   # Fetch the results and convert to JSON format
   search_results = urllib2.urlopen(url)
   json = simplejson.loads(search_results.read())

   # Process the results by pulling the first record, which has the best match
   results = json['responseData']['results']
   for r in results[:1]:
      if results is not None:
         url = r['url']
         desc = r['content'].encode('ascii', 'replace')
      else:
         url = "none"
         desc = "none"


   # Print the results to stdout.  Use redirect to capture the output
   print "%s\t%s" % (term.rstrip("\n"), url)

import time
time.sleep(1)

Solution

  • Here are some Python details for you first:

    None is a valid object in Python, of the type NoneType:

    print(type(None))
    

    Produces:

    < class 'NoneType' >

    And the no attribute error you got is normal when you try to access some method or attribute of an object that doesn't have that attribute. In this case, you were attempting to use the __getitem__ syntax (object[item_index]), which NoneType objects don't support because it doesn't have the __getitem__ method.

    The point of the previous explanation is that your assumption about what your error means is correct: your results object is essentially empty.

    As for why you're hitting this in the first place, I believe you are running up against Google's API limits. It looks like you're using the old API that is now deprecated. The number of search results (not queries) used to be limited to around 64 per query, and there used to be no rate or per-day limit. However, since it's been deprecated for over 5 years now, there may be new undocumented limits.

    I don't think it necessarily has anything to do with SPAM, but I do believe it is an undocumented limit.