I am working on medical project, where I have multiple questions along with the topics attached to them. The problem is that the following code works fine but it does not consider the 'must_not' filter, whereas it works fine with 'must' clause. Help me out with this.
GET stopdata/_search
{
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"match": {
"question": "Hello Dr. Iam suffering from fever with cough nd cold since 3 days"
}
}
}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"topics": [
"fever",
"cough"
]
}
}
],
"must_not": [
{
"terms": {
"topics": [
"children",
"child",
"childrens health"
]
}
}
]
}
},
"random_score": {}
}
},
"highlight": {
"fields": {
"keyword": {}
}
}
}
Also, I need to convert the code to Java, which I am trying but stuck with following code.
Set<String> mustNot = new HashSet<String>();
mustNot.add("child");
mustNot.add("children");
mustNot.add("childrens health");
Set<String> must = new HashSet<String>();
must.add("fever");
must.add("cough");
FunctionScoreQueryBuilder fsqb = new FunctionScoreQueryBuilder(QueryBuilders.matchQuery("question", "Hello Dr. Iam suffering from fever with cough nd cold since 3 days"));
fsqb.add(ScoreFunctionBuilders.randomFunction((new Date()).getTime()));
BoolQueryBuilder bqb = boolQuery()
.mustNot(termsQuery("topics", mustNot));
SearchResponse response1 = client.prepareSearch("stopdata")
.setQuery(fsqb)
.execute()
.actionGet();
System.out.println(response1.getHits().getTotalHits());
The mapping of the 'stopdata' index is as follow
{
"stopdata": {
"mappings": {
"questions": {
"properties": {
"answers": {
"type": "string"
},
"id": {
"type": "long"
},
"question": {
"type": "string",
"analyzer": "my_english"
},
"relevantQuestions": {
"type": "long"
},
"topics": {
"type": "string"
}
}
}
}
}
}
Adding the sample data for the above index
"question": "My son of age 8 months is suffering from cough and cold and fever. What treatment I have to follow?"
"topics": [
"Cough",
"Fever",
"Hydration",
"Nutrition",
"Tens",
"Childrens health"
]
"question": "Hi.My daughter, 4 years old , has on and of fever with severe coughing and colds for 3 days now.She vomited as well last night.Do you think it's viral?"
"topics": [
"Vomiting",
"Flu",
"Cough",
"Fever",
"Pneumonia",
"Meningitis",
"Tamiflu",
"Incision",
"Childrens health",
"Oseltamivir"
]
"question": "If you have a fever of 101 with chills and sweats for 2 day with a slight cough, should you go to the drs or let is wear off?"
"topics": [
"Cough",
"Fever"
]
The thing I see is that the whole filter
part is misplaced, it should go inside the filtered
query, because there is no filter
element at the root of the function_score
element (see official docs). So your query should look like this in the first place + you should use POST instead of GET since you're sending a payload:
POST stopdata/_search
{
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"match": {
"question": "Hello Dr. Iam suffering from fever with cough nd cold since 3 days"
}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"topics": [
"fever",
"cough"
]
}
}
],
"must_not": [
{
"terms": {
"topics": [
"children",
"child",
"childrens health"
]
}
}
]
}
}
}
},
"random_score": {}
}
},
"highlight": {
"fields": {
"keyword": {}
}
}
}
Now to write all this in Java, it goes like this:
Set<String> mustNot = new HashSet<String>();
mustNot.add("child");
mustNot.add("children");
mustNot.add("childrens health");
Set<String> must = new HashSet<String>();
must.add("fever");
must.add("cough");
MatchQueryBuilder query = QueryBuilders.matchQuery("question", "Hello Dr. Iam suffering from fever with cough nd cold since 3 days");
BoolFilterBuilder filter = FilterBuilders.boolFilter()
.must(FilterBuilders.termsFilter("topics", must))
.mustNot(FilterBuilders.termsFilter("topics", mustNot));
FilteredQueryBuilder fqb = QueryBuilders.filteredQuery(query, filter);
FunctionScoreQueryBuilder fsqb = QueryBuilders.functionScoreQuery(fqb);
fsqb.add(ScoreFunctionBuilders.randomFunction((new Date()).getTime()));
SearchResponse response1 = client.prepareSearch("stopdata")
.setQuery(fsqb)
.execute()
.actionGet();
System.out.println(response1.getHits().getTotalHits());
UPDATE
The reason why the must_not
doesn't match childrens health
is because the topics
field is analyzed and thus the Childrens health
gets tokenized and analyzed as two tokens childrens
and health
, thus trying a terms
match on childrens health
will not yield anything. Maybe, splitting into two terms would help:
"must_not": [
{
"terms": {
"topics": [
"children",
"child",
"childrens",
"health"
]
}
}
]