Search code examples
elasticsearchluceneelasticsearch-query

Elastic OR-options its not working as I expected


Why should I do this to get all results based on my three terms:

...
"must": [
  {
    "query_string": {
      "analyze_wildcard": true,
      "query": "* AND (name:\"NAME 1\" OR name:\"NAME 2\" OR name:\"NAME 3\")"
    }
  },
...

instead of

...
"must": [
  {
    "query_string": {
      "analyze_wildcard": true,
      "query": "* AND name:\"NAME 1\" OR name:\"NAME 2\" OR name:\"NAME 3\""
    }
  },
...

The first query returns all docs with NAME 1, NAME 2, and NAME 3, but the second query returns only docs with the term NAME 1.

in addition, the query below returns only the NAME 3

...
"must": [
  {
    "query_string": {
      "analyze_wildcard": true,
      "query": "name:\"NAME 1\" OR name:\"NAME 2\" OR name:\"NAME 3\" AND *"
    }
  },
...

It doesn't make sense, because if I create a query with a term that doesn't exist, such as: " * AND name: \ "asdfasdfa \" OR name: \ "NAME 2 \" OR name: \ "NAME 3 \" " , I will have an empty answer and thinking about code conditionals:

true && false || true || true is true

true && (false || true || true) also true

and,

'a' && null || 'b' || 'c' is 'b'

'a' && (null || 'b' || 'c') also 'b'


Solution

  • It's a topic of operands precedence. Quoting the docs (bold highlights added by myself):

    The familiar boolean operators AND, OR and NOT (also written &&, || and !) are also supported but beware that they do not honor the usual precedence rules, so parentheses should be used whenever multiple operators are used together.

    So, in case of query#2, the original:

    * AND name:\"NAME 1\" OR name:\"NAME 2\" OR name:\"NAME 3\"
    

    is internally converted to the lucene query:

    +*:* +name:\"name 1\" name:\"name 2\" name:\"name 3\"
    

    As such, it cannot be understood as:

    true && false || true || true
    

    which, in the standard boolean logic, would indeed be equal to true.

    Long story short, use parentheses whenever applicable.


    BTW, the terms in the lucene query were lowercased (analyzed be the standard analyzer) because you specified analyze_wildcard:true. This may or may not be exactly what you'd want so I thought it was worth mentioning!