Search code examples
solrlucenefull-text-searchinformation-retrievalbooleanquery

Solr query with multiple negations


On Solr 6.5.1, I have a *_txt_en field and a string document-type field. On these fields, I would like to build a query of the form:

Match all documents of a certain document-type, where:

  1. Certain phrases ("phrase one", "phrase two") must occur in the text field to be matched
  2. But if other phrases ("phrase three", "phrase four", "phrase five") also occur in this field, do not match it.

My current Solr query that I wrote looks as follows:

(documenttype:references AND (field:"phrase one" OR field:"phrase two")) AND NOT field:"phrase three" AND NOT field:"phrase four" AND NOT field:"phrase five"

An alternative I can think of is:

(documenttype:references AND (field:"phrase one" OR field:"phrase two")) AND NOT (field:"phrase three" OR field:"phrase four" OR field:"phrase five")

The above queries seems to work on a toy data set of a couple of examples. But I learned that with Solr, there are some unwritten rules and not obvious pitfalls, especially with negations as part of Boolean queries.

For a query as I described, is this the right syntax to form them ?


Solution

  • Your query looks fine to me.

    A NOT in lucene/solr is used to filter out results, and does not imply matching everything else, like in a database (well, sometimes it does in solr). An easy way to think of how negations in lucene work, is to assume that there is always an AND in front of them.

    • term1 OR NOT term2 will actually get the results of term1 AND NOT term2
    • NOT term1 won't work in lucene, because AND NOT term1 doesn't make sense. (solr will make that work by automatically transforming it into *:* AND NOT term1, thus why solr's treatment of NOT is kinda inconsistent)
    • term1 AND (NOT term2) won't work, because it's going to evaluate AND NOT term2 before moving on to the parts outside parentheses (I don't think solr corrects for this one, but don't quote me)

    For a bit more explanation of why it differs from DB style boolean logic, take a look at my answer here