Search code examples
solrelasticsearchwildcardproximityphrase

Proximity searching phrases with root expanders in Solr or ElasticSearch (especially websolr or bonsai.io)?


I'm trying to select a search tool for a large project, and I'd be interested to know if this use case was supported by Solr or ElasticSearch.

My customers are interested in conducting relatively sophisticated boolean searching. One search that is a must is the ability to conduct proximity searches on phrases with root expanders.

For example, imagine a user searching for a document with this phrase: "The cute dog was attacked by evil raccoons"

I'd like the user to be able to search for "evil rac*" within 5 words of "dog" and return a document with the above sentence. Ideally, a query would look something like:

("evil rac*" dog)~5

So far, the only search tool I've found that can do what I'm looking for is dtSearch. The query for dtSearch would be "evil rac*" w/5 dog, which is great. I'd rather use an open source tool like Solr or ElasticSearch (and especially a hosted solution such as websolr or bonsai.io). Any advice would be very much appreciated.


Solution

  • Definitely technically possible, but as of yet unsupported in Lucene. There are a few open issues to support "complex phrase" behavior in Lucene, which seems to be targeted at Lucene 4.3:

    LUCENE-1486 — An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.

    I don't see your specific query structure in their examples there, but this is definitely a lot closer than what's available today.

    To recap: theoretically feasible, not supported in syntax as of April 2013 and Lucene 4.2.1.

    (Hat tip to my business partner, Kyle, for help researching this.)