Search code examples
ruby-on-railselasticsearchgeosearchkick

Rails elasticsearch _geo_distance and custom scoring/sorting


So my Rails app using elasticsearch (with searchkick), is working just fine using the _geo_distance ordering function, however I need to do a more complex ordering that includes location AND an attempt to promote a business name exact string match.

For example, if I make a query and there are 10 ascending distance returned results, but the #5 result is also an exact string match on the business name in the record, I would like to promote/elevate that to the #1 position (basically overriding the distance sorting for that record).

There are two ways I can see to try to solve this issue, but I am running into issues with both.

First, would be to do this on the initial query, so that elasticsearch handles the work.

Second, would be to do some type of post-process re-sort on the result returned by elasticsearch to look for an exact match and re-order if needed.

The issue with the first method is that the built in scoring mechanisms seem to shift completely to distance when invoking _geo_distance, leaving me to wonder how to mix custom scoring functions with location.

And the issue with the second method is that the search results returned are a custom type of SearchKick object that does not seem to like normal array or hash sorting mechanisms for a post-process.

Is there a way to do something pre- or post- query to promote a document in the results in this manner?

Thanks.


Solution

  • In fact, there are many ways to "control" the scoring. Before indexing, if you already some document is meant to get high score/boost. You can give high score for the special document before indexing, please reference here.

    If you cannot determine the boost before the indexing, you can boost it in the query command. About the boosting query, there are also many options and it's dependent on what kind query you used.

    For query string query:

    You can boost some fields, such as fields" : ["content", "name.*^5"], or boost some query command such as, quick^2 fox(this might work for you, just extra boost the name).

    For others:

    You can give boost for term query, such as boosting the "ivan" case:

    "term" : {"name" : {"value" : "ivan","boost" : 10.0}}

    you can wrap it into bool query and boost the desired case. ex. find all 'ivan', boost 'ji' on name field.

    { "query" : { "bool" : { "must": [{"match":{"name":"ivan"}}],
    "should" : [ { "term" : { "name": { "value" : "ji", "boost" : 10 }}}]}}}

    Except for term query, there are a lot of queries that support boost, such as prefix query, match query. You can use it under situations. Here are some official examples: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html

    Boosting might not easy for controlling score, because it needs normalization. You can specify the score using the function_score query to specify the direct score: It's really a useful query if you need more directly control.


    In short, you can wrap your query in bool and add some boost for the name matching, as follow:

    { "query" : {
        "bool" : {
        "must": [
                {"filtered" : {
                "filter" : {
                    "geo_distance" : {
                        "distance" : "2000km",
                        "loc" : {
                            "lat" : 10,
                            "lon" : 10
                        }
                    }
                }
            }}],
        "should" : [ { "term" : { "name": { "value" : "ivan", "boost" : 10 }}}]}},
    "sort" : [
                "_score",
        {
            "_geo_distance" : {
                "loc" : [10, 10],
                "order" : "asc",
                "unit" : "km",
                "mode" : "min",
                "distance_type" : "sloppy_arc"
            }
        }
    ]
    }
    

    For more detailed, you can check my gist https://gist.github.com/hxuanji/e5acd9a5174ea10c08b8. I boost the "ivan" name. In the result, the "ivan" document becomes first rather than the (10,10) document.