Search code examples
sortingelasticsearch

Elasticsearch sorting with specific importance for field values


I am using Java and Spring data, Elasticsearch 6.8.14 Api. to communicate with Elasticsearch. I have index that returns such data (I am including this search result to show the mapping structure also)

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "rgt",
        "_type" : "carindexeddata",
        "_id" : "6020354",
        "_score" : 1.0,
        "_source" : {
          "id" : "4441",
          "version" : null,
          "carId" : "1263",
          "mark" : "ford",
          "colour" : "green",
          "status" : "Approved",
        
......

So basically I store cars. Now I need to sort them before returning to the user. I have to sort it:

 - mark
 - colour (within same mark colours are important)
 - status

And as for the status the sort order should be as follows:

1. BOUGHT
2. IN PRODUCTION
3. IN TESTS
4. APPROVED

So having such cars order would be OK:

1. Ford Black Bought
2. Ford Black Approved
3. Ford White Bought
4. GMC White Bought
5. GMC White Approved

Which mechanism in Elasticsearch could I use to sort items that way ? Is it possible to implement? Can u show some example ? Sorting by fields mark, colour, status is not correct because there is some custom logic in status sorting - it is not letter sorting but some weight sorting I would say.. but how to give specific weights for specific statuses in elasticesearch? Should I store a field with some number for each status in Elastic search and sort according to this number field instead status field directly ?


Solution

  • For status field, you can use script sort

    {
      "sort": [
        {
          "mark.keyword": "asc",
          "colour.keyword": "asc",
          "_script": {
            "type": "number",
            "script": {
              "lang": "painless",
              "source": """
                        if(doc['status.keyword'].value.toUpperCase()=="BOUGHT")
                                return 1;
                        else if(doc['status.keyword'].value.toUpperCase()=="IN PRODUCTION")
                                return 2;
                        else if(doc['status.keyword'].value.toUpperCase()=="IN TESTS")
                                return 3;
                        else return 4;
                        """
            },
            "order": "asc"
          }
        }
      ]
    }
    

    Scripts are slow.

    Elastic search works best when data is preprocessed. If you can have a numeric field which represents status value, performance will be better. You need to check it out what works best for your case.