Search code examples
elasticsearchelastic-stack

ElasticSearch - query documents by term and field priority


I'm currently working with elasticsearch and I'm trying to implement a query from the Java backend that will query documents from my index not only by term but by field priority as well. In my index, I have documents that have a term and a field that specifies a type.

e.g 
term: "Flu Shot"
type: "procedure"

term: "Fluphenazine"
type: "drug"

I created a query that will search by term and the elastic index will return the most relevant results matching that term. The functionality I want to create is to create a query to return results matching that same term but ordered by a priority of the 'type' field. For example when I type "flu" I want to get the documents with type: "procedure" first then after them the ones with the type "drug". Currently, the index returns only documents with type "drugs" due to many drugs that start with "flu".


Solution

  • You can use function_score.

    The function_score allows you to modify the score of documents that are retrieved by a query. To use function_score, the user has to define a query and one or more functions, that compute a new score for each document returned by the query.

    Example your data in question (using Elasticsearch server 7.9):

    1. Create index, and add documents

       PUT /example_index
       {
         "mappings": {
           "properties": {
             "term": {"type": "text" },
             "type": {"type": "keyword"}
           }
         }
       }
      
       PUT /_bulk
       {"create": {"_index": "example_index", "_id": 1}}
       {"term": "Flu Shot", "type": "procedure"}
       {"create": {"_index": "example_index", "_id": 2}}
       {"term": "Fluphenazine", "type": "drug"}
       {"create": {"_index": "example_index", "_id": 3}}
       {"term": "Flu Shot2", "type": "procedure"}
       {"create": {"_index": "example_index", "_id": 4}}
       {"term": "Fluphenazine2", "type": "drug"}
      
    2. Query documents using custom scoring logic

       GET /example_index/_search
       {
         "query": {
           "function_score": {
             "query": {
               "wildcard": {
                 "term": {
                   "value": "*flu*"
                 }
               }
             },
             "functions": [
               {
                 "filter": {
                   "term": {
                     "type": "procedure"
                   }
                 },
                 "weight": 2
               },
               {
                 "filter": {
                   "term": {
                     "type": "drug"
                   }
                 },
                 "weight": 1
               }
             ]
           }
         }
       }
      
    3. Results:

       {
         "took" : 2,
         "timed_out" : false,
         "_shards" : {
           "total" : 1,
           "successful" : 1,
           "skipped" : 0,
           "failed" : 0
         },
         "hits" : {
           "total" : {
             "value" : 4,
             "relation" : "eq"
           },
           "max_score" : 2.0,
           "hits" : [
             {
               "_index" : "example_index",
               "_type" : "_doc",
               "_id" : "1",
               "_score" : 2.0,
               "_source" : {
                 "term" : "Flu Shot",
                 "type" : "procedure"
               }
             },
             {
               "_index" : "example_index",
               "_type" : "_doc",
               "_id" : "3",
               "_score" : 2.0,
               "_source" : {
                 "term" : "Flu Shot2",
                 "type" : "procedure"
               }
             },
             {
               "_index" : "example_index",
               "_type" : "_doc",
               "_id" : "2",
               "_score" : 1.0,
               "_source" : {
                 "term" : "Fluphenazine",
                 "type" : "drug"
               }
             },
             {
               "_index" : "example_index",
               "_type" : "_doc",
               "_id" : "4",
               "_score" : 1.0,
               "_source" : {
                 "term" : "Fluphenazine2",
                 "type" : "drug"
               }
             }
           ]
         }
       }
      

    You can see the documents with type set to procedure have a higher score than the documents with type set to drug. This is because we've assigned different weights to the different types in the function_score.