Search code examples
pythonamazon-web-servicesamazon-cloudsearch

Cloudsearch Request Exceed 10,000 Limit


When I search a query that has more than 10,000 matches I get the following error:

{u'message': u'Request depth (10100) exceeded, limit=10000', u'__type': u'#SearchException', u'error': {u'rid': u'zpXDxukp4bEFCiGqeQ==', u'message': u'[*Deprecated*: Use the outer message field] Request depth (10100) exceeded, limit=10000'}}

When I search for more narrowed down keywords and queries with less results, everything works fine and no error is returned.

I guess I have to limit the search somehow, but I'm unable to figure out how. My search function looks like this:

def execute_query_string(self, query_string):
    amazon_query = self.search_connection.build_query(q=query_string, start=0, size=100)

    json_search_results = []
    for json_blog in self.search_connection.get_all_hits(amazon_query):
        json_search_results.append(json_blog)

    results = []
    for json_blog in json_search_results:
        results.append(json_blog['fields']) 

    return results

And it's being called like this:

results = searcher.execute_query_string(request.GET.get('q', ''))[:100]

As you can see, I've tried to limit the results with the start and size attributes of build_query(). I still get the error though.

I must have missunderstood how to avoid getting more than 10,000 matches on a search result. Can someone tell me how to do it?

All I can find on this topic is Amazon's Limits where it says that you can only request 10,000 results. It does not say how to limit it.


Solution

  • You're calling get_all_hits, which gets ALL results for your query. That is why your size param is being ignored.

    From the docs:

    get_all_hits(query) Get a generator to iterate over all search results

    Transparently handles the results paging from Cloudsearch search results so even if you have many thousands of results you can iterate over all results in a reasonably efficient manner.

    http://boto.readthedocs.org/en/latest/ref/cloudsearch2.html#boto.cloudsearch2.search.SearchConnection.get_all_hits

    You should be calling search instead -- http://boto.readthedocs.org/en/latest/ref/cloudsearch2.html#boto.cloudsearch2.search.SearchConnection.search