I'm currently migrating an application from Solr to Elastic and stumbled over an interesting Solr feature that I cannot reproduce in Elastic: The query to Solr returns a postprocessing flag that does a quality check on the result, indicating wether all tokens are found in the result field.
q = some_field:(the brown fox)
fl = some_field, full_match:exists(query({!edismax v='some_field:(the brown fox)' mm='100%'}))
The Solr result looks as follows:
{
"response": {
"docs": [
{
"some_field": "The Brown Bear",
"full_match": false
},
{
"some_field": "The Quick Brown Fox",
"full_match": true
}
]
}
}
The flag is used by the client to further process the result documents, independent of the score (which I omitted in the example). I found this quite smart, as the tokenization and distributed computation power of Solr is used instead of doing everything in the client.
Now in Elastic I assume this should be done the script_fields
block, but actually I have no clue how to perform a subquery with a painless script and after two days of investigation I doubt that this is possible at all:
{
"query": {
"match": {
"some_field": "the brown fox"
}
},
"_source": [
"some_field"
],
"script_fields": {
"full_match": {
"script": "???" <-- Search with Painless script?
}
}
}
Any creative ideas are welcome.
How about using Elasticsearch's named queries in combination with the minimum_should_match parameter and setting that to 100% to match only documents where all tokens match?
You would then be able to detect queries where all tokens match in the response. You can also set "boost": 0 to avoid affecting the score of your main query.
Here's an example request:
{
"query": {
"bool": {
"should": [
{
"match": {
"message": {
"query": "the brown fox",
"_name": "main_query"
}
}
},
{
"match": {
"message": {
"query": "the brown fox",
"_name": "all_tokens_match",
"minimum_should_match": "100%",
"boost": 0
}
}
}
]
}
}
}
You would then get a response that looks a bit like this:
{
"hits": [
{
"_score": 0.99938476,
"_source": {
"message": "The Quick Brown Fox"
},
"matched_queries": [
"main_query",
"all_tokens_match"
]
},
{
"_score": 0.38727614,
"_source": {
"message": "The Brown Bear"
},
"matched_queries": [
"main_query"
]
}
]
}
Documents that all tokens in your query match will then have all_tokens_match included in the matched_queries part of the response.