Search code examples
elasticsearchsolrelasticsearch-painless

Postprocessing Elastic results with another search (migrating from Solr)


I'm currently migrating an application from Solr to Elastic and stumbled over an interesting Solr feature that I cannot reproduce in Elastic: The query to Solr returns a postprocessing flag that does a quality check on the result, indicating wether all tokens are found in the result field.

q  = some_field:(the brown fox)
fl = some_field, full_match:exists(query({!edismax v='some_field:(the brown fox)' mm='100%'}))

The Solr result looks as follows:

{
    "response": {
        "docs": [
            {
                "some_field": "The Brown Bear",
                "full_match": false
            },
            {
                "some_field": "The Quick Brown Fox",
                "full_match": true
            }
        ]
    }
}

The flag is used by the client to further process the result documents, independent of the score (which I omitted in the example). I found this quite smart, as the tokenization and distributed computation power of Solr is used instead of doing everything in the client.

Now in Elastic I assume this should be done the script_fields block, but actually I have no clue how to perform a subquery with a painless script and after two days of investigation I doubt that this is possible at all:

{
    "query": {
        "match": {
            "some_field": "the brown fox"
        }
    },
    "_source": [
        "some_field"
    ],
    "script_fields": {
        "full_match": {
            "script": "???" <-- Search with Painless script?
        }
    }
}

Any creative ideas are welcome.


Solution

  • How about using Elasticsearch's named queries in combination with the minimum_should_match parameter and setting that to 100% to match only documents where all tokens match?

    You would then be able to detect queries where all tokens match in the response. You can also set "boost": 0 to avoid affecting the score of your main query.

    Here's an example request:

    {
        "query": {
            "bool": {
                "should": [
                    {
                        "match": {
                            "message": {
                                "query": "the brown fox",
                                "_name": "main_query"
                            }
                        }
                    },
                    {
                        "match": {
                            "message": {
                                "query": "the brown fox",
                                "_name": "all_tokens_match",
                                "minimum_should_match": "100%",
                                "boost": 0
                            }
                        }
                    }
                ]
            }
        }
    }
    

    You would then get a response that looks a bit like this:

    {
        "hits": [
            {
                "_score": 0.99938476,
                "_source": {
                    "message": "The Quick Brown Fox"
                },
                "matched_queries": [
                    "main_query",
                    "all_tokens_match"
                ]
            },
            {
                "_score": 0.38727614,
                "_source": {
                    "message": "The Brown Bear"
                },
                "matched_queries": [
                    "main_query"
                ]
            }
        ]
    }
    

    Documents that all tokens in your query match will then have all_tokens_match included in the matched_queries part of the response.