Search code examples
pythonwhooshsearch-suggestion

Generating search term suggestions with Whoosh?


I've got a set of documents in a Whoosh index, and I want to provide a search term suggestion feature. So If you type "pop", some suggestions that could come up might be:

  • popcorn
  • popular
  • pope
  • Poplar Film
  • pop culture

I've got the terms that should be coming up as suggestions going into an NGRAMWORDS field in my index, but when I do a query on that field I get autocompleted results rather than the expanded suggestions - so I get documents tagged with "pop culture", but no way to show that term to the user. (For comparison, I'd do this in ElasticSearch using a completion mapping on that field and then use the _suggest endpoint to get the suggestions.)

I can only find examples for autocomplete or spelling correction in the documentation or elsewhere on on the web. Is there any way I can get search term suggestions from my index with Whoosh?

Edit: expand_prefix was a much-needed pointer in the right direction. I've ended up using a KEYWORD(commas=True, lowercase=True) for my suggest field, and code like this to get suggestions in most-common-first order (expand_prefix and iter_prefix will yield them in alphabetical order):

def get_suggestions(term):
    with ix.reader() as r:
        suggestions = [(s[0], s[1].doc_frequency()) for s in r.iter_prefix('suggest', term)]
    return sorted(suggestions, key=itemgetter(1), reverse=True)

Solution

  • This is not what you are looking for exactly, but probably can help you:

    reader = index.reader()
    for x in r.expand_prefix('title', 'pop'):
      print x
    

    Output example:

    pop
    popcorn
    popular
    

    Update

    Another workaround is to build another index with keywords as TEXT only. And play with search language. What I could achieve:

    In [12]: list(ix.searcher().search(qp.parse('pop*')))
    Out[12]: 
    [<Hit {'keywords': u'popcorn'}>,
     <Hit {'keywords': u'popular'}>,
     <Hit {'keywords': u'pope'}>,
     <Hit {'keywords': u'Popular Film'}>,
     <Hit {'keywords': u'pop culture'}>]