Search code examples
elasticsearchresthighlevelclient

query on same field with multiple values using RestHighLevelClient


I have to filter/query multiple urls on "url" field using Elasticsearch RestHighLevelClient. I formed my query as below but it is giving 0 records.

query.must(QueryBuilders.queryStringQuery("http://localhost:8080/test/*")
                         .field("url")
                        .lenient(true)
                        .escape(true)
                        .analyzeWildcard(true)
                        .fuzziness(Fuzziness.ZERO)
                        .defaultOperator(Operator.AND)
                        .boost(1.0f));
query.must(QueryBuilders.queryStringQuery("http://www.bbc.com/*")
                     .field("url")
                    .lenient(true)
                    .escape(true)
                    .analyzeWildcard(true)
                    .fuzziness(Fuzziness.ZERO)
                    .defaultOperator(Operator.AND)
                    .boost(1.0f));

If I change to below it will give me only matching records url http://localhost:8080/test/* as I am setting Operator.AND to this and for query on url http://www.bbc.com/* to Operator.OR.

query.must(QueryBuilders.queryStringQuery("http://localhost:8080/test/*")
                         .field("url")
                        .lenient(true)
                        .escape(true)
                        .analyzeWildcard(true)
                        .fuzziness(Fuzziness.ZERO)
                        .defaultOperator(Operator.AND)
                        .boost(1.0f));
query.must(QueryBuilders.queryStringQuery("http://www.bbc.com/*")
                     .field("url")
                    .lenient(true)
                    .escape(true)
                    .analyzeWildcard(true)
                    .fuzziness(Fuzziness.ZERO)
                    .defaultOperator(Operator.OR)
                    .boost(1.0f));

So it is ignoring http://www.bbc.com/* filter.

Am I doing any mistake here? How to write multiple queries on same field?


Solution

  • You should use should rather than must in your bool query.

    What happens in the original query?

    The first query you perform actually asks both of the url values to be present in a document: it will match if and only if in the same document you have url: http://localhost:8080/test/ and url: http://www.bbc.com/.

    This behaviour is normal for a bool query, and is not specific to BoolQueryBuilder that I assume you are using.

    How to do a logical OR of two queries?

    In fact, you should use BoolQueryBuilder.should() to put these two queries in a logical OR:

    query.should(QueryBuilders.queryStringQuery("http://localhost:8080/test/*")
                         .field("url")
                        .lenient(true)
                        .escape(true)
                        .analyzeWildcard(true)
                        .fuzziness(Fuzziness.ZERO)
                        .defaultOperator(Operator.AND)
                        .boost(1.0f));
    query.should(QueryBuilders.queryStringQuery("http://www.bbc.com/*")
                     .field("url")
                    .lenient(true)
                    .escape(true)
                    .analyzeWildcard(true)
                    .fuzziness(Fuzziness.ZERO)
                    .defaultOperator(Operator.AND)
                    .boost(1.0f));
    

    How can I combine this with other parts of my query?

    As you pointed out in the comment, your query is actually more complex: it must match one of the URL patterns, and results should be boosted if they also match the content.

    In order to achieve this you should use two nested bool queries like this:

    BoolQueryBuilder urlQuery = BoolQueryBuilder();
    
    urlQuery.should(QueryBuilders.queryStringQuery("http://localhost:8080/test/*")
                             .field("url")
                            .lenient(true)
                            .escape(true)
                            .analyzeWildcard(true)
                            .fuzziness(Fuzziness.ZERO)
                            .defaultOperator(Operator.AND)
                            .boost(1.0f));
    
    urlQuery.should(QueryBuilders.queryStringQuery("http://www.bbc.com/*")
                         .field("url")
                        .lenient(true)
                        .escape(true)
                        .analyzeWildcard(true)
                        .fuzziness(Fuzziness.ZERO)
                        .defaultOperator(Operator.AND)
                        .boost(1.0f));
    
    WildcardQueryBuilder wildcardQuery = QueryBuilderswildcardQuery("content", "anyt*");
    
    
    // here `query` is your original bool query
    query.must(urlQuery);
    query.should(wildcardQuery);
    

    Elasticsearch will interpret this query as something like:

    fetch documents that must match either url query #1 or url query #2, and rank higher those that match wildcardQuery

    What does defaultOperator have to do with all this?

    The .defaultOperator(Operator.OR) is actually just Elasticsearch trying to confuse you: it has nothing to do with uniting two queries in a logical OR, but instead is a parameter of query string query:

    default_operator

    (Optional, string) Default boolean logic used to interpret text in the query string if no operators are specified.

    This parameter actually tells Elasticsearch how to interpret the tokens inside the queryStringQuery() that you pass. You can think about the string query as a query in Lucene query language.

    Hope that helps!