Search code examples
javaelasticsearchelasticsearch-java-apielasticsearch-query

Elasticsearch Java High Level Rest Client constructing a boolean query with multiple match values and OR condition


I am trying to construct a query via the java high level rest client that implements taking a list of ids and returning all those documents that match a given id akin to a WHERE clause with an OR operator.

For this reason I have been going with bool query, and trying to iterate the list and must match for each value with operator set to OR

BoolQueryBuilder builder = QueryBuilders.boolQuery();
ids.forEach(i -> {
     bool.must(QueryBuilders.matchQuery("_id", i).operator(Operator.OR));
});

return bool;

// this constructs the dsl this way

{
  "bool" : {
    "must" : [
      {
        "match" : {
          "_id" : {
            "query" : "0025370c-baea-4dcc-af48-56c4bdb86854",
            "operator" : "OR",
            "prefix_length" : 0,
            "max_expansions" : 50,
            "fuzzy_transpositions" : true,
            "lenient" : false,
            "zero_terms_query" : "NONE",
            "auto_generate_synonyms_phrase_query" : true,
            "boost" : 1.0
          }
        }
      },
      {
        "match" : {
          "_id" : {
            "query" : "013fedef-6b04-4520-8458-fca8b0366833",
            "operator" : "OR",
            "prefix_length" : 0,
            "max_expansions" : 50,
            "fuzzy_transpositions" : true,
            "lenient" : false,
            "zero_terms_query" : "NONE",
            "auto_generate_synonyms_phrase_query" : true,
            "boost" : 1.0
          }
        }
      },
      {
        "match" : {
          "_id" : {
            "query" : "01c44ce4-0e87-4dc9-8a29-1f24679d335f",
            "operator" : "OR",
            "prefix_length" : 0,
            "max_expansions" : 50,
            "fuzzy_transpositions" : true,
            "lenient" : false,
            "zero_terms_query" : "NONE",
            "auto_generate_synonyms_phrase_query" : true,
            "boost" : 1.0
          }
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}

which is constructed fine only it doesn't work because I think the OR is nested to low, and doesn't get applied across the multiple matches. So I assume that there needs to be a nested type and I tried this:

BoolQueryBuilder bool = QueryBuilders.boolQuery();
BoolQueryBuilder subBool = QueryBuilders.boolQuery();
ids.forEach(i -> {
     subBool.must(QueryBuilders.matchQuery("_id", i).operator(Operator.OR));
});

bool.must(subBool);

return bool;

// it would make more sense to me to place the operator condition on bool instead of subBool but it is not available and I am sure I am going at that wrong

{
  "bool" : {
    "must" : [
      {
        "bool" : {
          "must" : [
            {
              "match" : {
                "_id" : {
                  "query" : "0025370c-baea-4dcc-af48-56c4bdb86854",
                  "operator" : "OR",
                  "prefix_length" : 0,
                  "max_expansions" : 50,
                  "fuzzy_transpositions" : true,
                  "lenient" : false,
                  "zero_terms_query" : "NONE",
                  "auto_generate_synonyms_phrase_query" : true,
                  "boost" : 1.0
                }
              }
            },
            {
              "match" : {
                "_id" : {
                  "query" : "013fedef-6b04-4520-8458-fca8b0366833",
                  "operator" : "OR",
                  "prefix_length" : 0,
                  "max_expansions" : 50,
                  "fuzzy_transpositions" : true,
                  "lenient" : false,
                  "zero_terms_query" : "NONE",
                  "auto_generate_synonyms_phrase_query" : true,
                  "boost" : 1.0
                }
              }
            },
            {
              "match" : {
                "_id" : {
                  "query" : "01c44ce4-0e87-4dc9-8a29-1f24679d335f",
                  "operator" : "OR",
                  "prefix_length" : 0,
                  "max_expansions" : 50,
                  "fuzzy_transpositions" : true,
                  "lenient" : false,
                  "zero_terms_query" : "NONE",
                  "auto_generate_synonyms_phrase_query" : true,
                  "boost" : 1.0
                }
              }
            }
          ],
          "adjust_pure_negative" : true,
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}

This seems to work if I reduce it to single value in the nested match (again 1 id instead of the lot)...so I still think I am implementing the OR condition wrong.

Filters within the bool query instead of must matches yield the same result. Appreciate the help.


Solution

  • The OR-Operator in the match-Queries means that only one term of each query-string of that particular sub-query has to match the document in order for the sub-query to match, so that's not what you're aiming for. To compound the sub-queries with OR, you have to use should instead of mustin your root bool-query. must is the ElasticSearch-equivalent of the AND-operator, while shouldmeans OR.