I have to filter/query multiple urls on "url" field using Elasticsearch RestHighLevelClient. I formed my query as below but it is giving 0 records.
query.must(QueryBuilders.queryStringQuery("http://localhost:8080/test/*")
.field("url")
.lenient(true)
.escape(true)
.analyzeWildcard(true)
.fuzziness(Fuzziness.ZERO)
.defaultOperator(Operator.AND)
.boost(1.0f));
query.must(QueryBuilders.queryStringQuery("http://www.bbc.com/*")
.field("url")
.lenient(true)
.escape(true)
.analyzeWildcard(true)
.fuzziness(Fuzziness.ZERO)
.defaultOperator(Operator.AND)
.boost(1.0f));
If I change to below it will give me only matching records url http://localhost:8080/test/*
as I am setting Operator.AND
to this and for query on url http://www.bbc.com/*
to Operator.OR
.
query.must(QueryBuilders.queryStringQuery("http://localhost:8080/test/*")
.field("url")
.lenient(true)
.escape(true)
.analyzeWildcard(true)
.fuzziness(Fuzziness.ZERO)
.defaultOperator(Operator.AND)
.boost(1.0f));
query.must(QueryBuilders.queryStringQuery("http://www.bbc.com/*")
.field("url")
.lenient(true)
.escape(true)
.analyzeWildcard(true)
.fuzziness(Fuzziness.ZERO)
.defaultOperator(Operator.OR)
.boost(1.0f));
So it is ignoring http://www.bbc.com/*
filter.
Am I doing any mistake here? How to write multiple queries on same field?
You should use should
rather than must
in your bool
query.
The first query you perform actually asks both of the url
values to be present in a document: it will match if and only if in the same document you have url: http://localhost:8080/test/
and url: http://www.bbc.com/
.
This behaviour is normal for a bool
query, and is not specific to BoolQueryBuilder
that I assume you are using.
In fact, you should use BoolQueryBuilder.should()
to put these two queries in a logical OR:
query.should(QueryBuilders.queryStringQuery("http://localhost:8080/test/*")
.field("url")
.lenient(true)
.escape(true)
.analyzeWildcard(true)
.fuzziness(Fuzziness.ZERO)
.defaultOperator(Operator.AND)
.boost(1.0f));
query.should(QueryBuilders.queryStringQuery("http://www.bbc.com/*")
.field("url")
.lenient(true)
.escape(true)
.analyzeWildcard(true)
.fuzziness(Fuzziness.ZERO)
.defaultOperator(Operator.AND)
.boost(1.0f));
As you pointed out in the comment, your query is actually more complex: it must match one of the URL patterns, and results should be boosted if they also match the content.
In order to achieve this you should use two nested bool queries like this:
BoolQueryBuilder urlQuery = BoolQueryBuilder();
urlQuery.should(QueryBuilders.queryStringQuery("http://localhost:8080/test/*")
.field("url")
.lenient(true)
.escape(true)
.analyzeWildcard(true)
.fuzziness(Fuzziness.ZERO)
.defaultOperator(Operator.AND)
.boost(1.0f));
urlQuery.should(QueryBuilders.queryStringQuery("http://www.bbc.com/*")
.field("url")
.lenient(true)
.escape(true)
.analyzeWildcard(true)
.fuzziness(Fuzziness.ZERO)
.defaultOperator(Operator.AND)
.boost(1.0f));
WildcardQueryBuilder wildcardQuery = QueryBuilderswildcardQuery("content", "anyt*");
// here `query` is your original bool query
query.must(urlQuery);
query.should(wildcardQuery);
Elasticsearch will interpret this query as something like:
fetch documents that must match either
url query #1
orurl query #2
, and rank higher those that matchwildcardQuery
defaultOperator
have to do with all this?The .defaultOperator(Operator.OR)
is actually just Elasticsearch trying to confuse you: it has nothing to do with uniting two queries in a logical OR, but instead is a parameter of query string query:
default_operator
(Optional, string) Default boolean logic used to interpret text in the query string if no operators are specified.
This parameter actually tells Elasticsearch how to interpret the tokens inside the queryStringQuery()
that you pass. You can think about the string query as a query in Lucene query language.
Hope that helps!