Search code examples
elasticsearchelasticsearch-7

Elastic search score documents based on how close numerically a string is


Assuming we have documents with the following format in elastic indexed:

{
  "street": "Adenauer Allee",
  "number": "119",
  "zipcode": "53113"
}

and we have a query like:

{
    "from": 0,
    "size": 1,
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "zipcode": {
                            "query": "53113",
                            "fuzziness": "0"
                        }
                    }
                },
                {
                    "match": {
                        "street": {
                            "query": "Adenauer Allee",
                            "fuzziness": "auto"
                        }
                    }
                }
            ],
            "should": [
                {
                    "match": {
                        "number": {
                            "query": "119"
                        }
                    }
                } 
            ]
        }
    }
}

Now let's say that our index contains 3 documents with

street: "Adenauer Allee"
zipcode: "53113"

and they have different house numbers like:

doc1: number: "11"
doc2: number: "120"
doc3: number: "10a"

(Notice the "a" in doc3).

The above query will return as a result doc1 with number "11" (since it's closer alphanumerically).

Desired behavior is to return first the document with the closest numerical value. In the above scenario this is doc2 with number "120".

How can I achieve that?

Elastic search info:

{
"name": "193a315bccae",
"cluster_name": "demo",
"cluster_uuid": "kg3tZZOyqOgqTbn_elqs_g",
"version": {
"number": "7.5.1",
"build_flavor": "default",
"build_type": "docker",
"build_hash": "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
"build_date": "2019-12-16T22:57:37.835892Z",
"build_snapshot": false,
"lucene_version": "8.3.0",
"minimum_wire_compatibility_version": "6.8.0",
"minimum_index_compatibility_version": "6.0.0-beta1"
},
"tagline": "You Know, for Search"
}

Solution

  • The script_score-query allows you to implement your custom scoring logic (see Elasticsearch reference: Script Score Query). Rather than implementing your own script, you can also use one of the predefined decay functions for numeric fields, assuming that you "cleaned-up" the street numbers from characters (you can can convert number into a multi-field and store the numeric part of it separately, e.g. number.numeric)

    In previous versions of Elasticsearch you can use the function_score-query to implement the same logic (see Elasticsearch reference: Function Score Query).