Search code examples
elasticsearchelasticsearch-5elasticsearch-painless

MySql Order By Value equivalent in ElasticSearch 5.6


ElasticSearch Version: 5.6

I have imported MySQL data in ElasticSearch and I have added mapping to the elastic search as required. Following is one mapping for the column application_status.

Mappings:

{
"settings": {
    "analysis": {
        "analyzer": {
            "case_insensitive": {
                "type": "custom",
                "tokenizer": "keyword",
                "filter": ["lowercase"]
            }
        }
    }
},
"mappings": {
    "lead": {
        "properties": {
            "application_status": {
                "type": "string",
                "analyzer": "case_insensitive",
                "fields": {
                    "keyword": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}}

On the above mapping, I am able to do simple sorting (asc or desc) using following query:

{
"size": 50,
"from": 0,
"sort": [{
    "application_status.keyword": {
        "order": "asc"
    }
}]}

which is MySql equivalent of

select * from <table_name> order by application_status asc limit 50;

Need help on following problem: I have MySQL query which sorts based on application_status:

select * from vLoan_application_grid order by CASE WHEN application_status = "IP_QUAL_REASSI" THEN application_status END desc, CASE WHEN application_status = "IP_COMPLE" THEN application_status END desc, CASE WHEN application_status LIKE "IP_FRESH%" THEN application_status END desc, CASE WHEN application_status LIKE "IP_%" THEN application_status END desc

Please help me write the same query in ElasticSearch. I am not able to find order by value equivalent for strings in ElasticSearch. Searching online, I understood that, I should use sorting scripts but not able to find any proper documentation.

I have following query which just does simple sort.

{
"size": 500,
"from": 0,
"query" : {
    "match_all": {}
},
"sort": {
    "_script": {
        "type": "string",
        "script": {
            "source": "doc['application_status.keyword'].value",
            "params": {
                "factor": ["IP_QUAL_REASS", "IP_COMPLE"]
            }
        },
        "order": "desc"
    }
}}

In the above query, I am not using params section as I am not aware how to use it for type: string

I believe I am asking too much. Please help or any relevant documentation links would be greatly appreciated. Hope question is clear. I'll provide more details if necessary.


Solution

  • You have two options:

    • the most performant one is to index at indexing time another field that should be a number. This number (your choice) will be the numerical representation of that status. Then at search time, you simply sort by that number and not by the status
    • at search time use a script that will do almost the same thing as the first option, but dynamically, and less performant (but still quite fast)

    Below you have the second option:

      "sort": {
        "_script": {
          "type": "number",
          "script": {
            "source": "if (params.factor[0].containsKey(doc['application_status.keyword'].value)) return params.factor[0].get(doc['application_status.keyword'].value); else return 1000;",
            "params": {
              "factor": [{
                "IP_QUAL_REASS":1,
                "IP_COMPLE":2,
                "whatever":3
              }
              ]
            }
          },
          "order": "asc"
        }
      }
    

    If you also want things like LIKE WHATEVER%, my suggestion is to consider an indexing time change, rather than search time because the script gets more complex. But, this is the one for wildcard matches as well:

      "sort": {
        "_script": {
          "type": "number",
          "script": {
            "source": "if (params.factor[0].containsKey(doc['application_status.keyword'].value)) return params.factor[0].get(doc['application_status.keyword'].value); else { params.wildcard_factors[0].entrySet().stream().filter(kv -> doc['application_status.keyword'].value.startsWith(kv.getKey())).map(Map.Entry::getValue).findFirst().orElse(1000)}",
            "params": {
              "factor": [
                {
                  "IP_QUAL_REASS": 1,
                  "IP_COMPLE": 2,
                  "whatever": 3
                }
              ],
              "wildcard_factors": [
                {
                  "REJ_": 66
                }
              ]
            }
          },
          "order": "asc"
        }
      }