Search code examples
elasticsearch

Odd behavioir of range in elasticsearch restful query when I have more than on range in the query


I am using elasticsearch 8.8.0

I currently have this query

{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "deliveryStatus": {
                            "gte": -2,
                            "lte": 9
                        }
                    }
                },
                {
                    "range": {
                        "cardDate": {
                            "gte": "2024-04-16",
                            "lte": "2024-04-16"
                        }
                    }
                }
            ]
        }
    }
}

which gives me this result

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 828,
            "relation": "eq"
        },
        "max_score": 2.0,
        "hits": [
            {
                "_index": "credit",
                "_id": "70230254011604",
                "_score": 2.0,
                "_source": {
                    "cardDate": "2024-04-16T00:00:00",
                    "deliveryStatus": "3"
                }
            }
        ]
    }
}

However, when I tried to increase the upper bound of deliveryStatus from 9 to 10 as shown in this query

{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "deliveryStatus": {
                            "gte": -2,
                            "lte": 10
                        }
                    }
                },
                {
                    "range": {
                        "cardDate": {
                            "gte": "2024-04-16",
                            "lte": "2024-04-16"
                        }
                    }
                }
            ]
        }
    }
}

In my understanding, the result should remain the same - however when I hit it; this is the result that I got

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

Which is odd and not aligned with my current understanding. Out of curiosity, i tried to remove the cardDate to see if I got the same result or not as shown in this query

{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "deliveryStatus": {
                            "gte": -2,
                            "lte": 10
                        }
                    }
                }
            ]
        }
    }
}

and I got this result

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "credit",
                "_id": "65391401441421011",
                "_score": 1.0,
                "_source": {
                    "cardDate": "2023-11-10T00:00:00"
                    "deliveryStatus": -1
                }
            }
        ]
    }
}

Can someone explain on why and how to fix it if I want the upper bound of the deliveryStatus is 13 (I have tried to change to 10 or above, it shows me an empty list)

/EDIT: I have read that it might be an issue on mismatch data type in my index. However, if that was the case - wouldn't that make the first query won't work either? The current data type for deliveryStatus in my elasticSearch is "Text" I don't want delete my index until I understand the problem since there are already quite a lot of data within the index

//EDIT: As per requested, below is the mapping of my index

cardDate


{
    "credit": {
        "mappings": {
            "cardDate": {
                "full_name": "cardDate",
                "mapping": {
                    "cardDate": {
                        "type": "date"
                    }
                }
            }
        }
    }
}

and this is deliveryStatus

{
    "credit": {
        "mappings": {
            "deliveryStatus": {
                "full_name": "deliveryStatus",
                "mapping": {
                    "deliveryStatus": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        },
                        "fielddata": true
                    }
                }
            }
        }
    }
}

Solution

  • @Gagak

    Here is how it works -

    Range query works basis on the field type. If it is string type like keyword or text then it sort all terms in Alphabetical order.

    And if it is numeric field type, then it will sort on all numeric value.

    But in your case -

    The numeric value is mapped with the type keyword so it is trying to sort alphabetically but eventually return wrong result.

    Lets understand with an example - Create an index with keyword field type and index some data.

    PUT test-range
    {
      "mappings": {
        "properties": {
          "message":{
            "type": "keyword"
          }
        }
      }
    }
    
    POST test-range/_doc
    {
      "message":17
    }
    
    POST test-range/_doc
    {
      "message":2
    }
    
    POST test-range/_doc
    {
      "message":23
    }
    
    POST test-range/_doc
    {
      "message":"ab"
    }
    
    POST test-range/_doc
    {
      "message":"ac"
    }
    
    POST test-range/_doc
    {
      "message":"ad"
    }
    
    POST test-range/_doc
    {
      "message":"ac1"
    }
    
    

    Let's perform range query for text -

    GET test-range/_search
    {
      "query": {
        "range": {
          "message": {
            "gte": "ac",
            "lte": "ad"
          }
        }
      }
    }
    

    It will give you proper result like below

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": {
          "value": 3,
          "relation": "eq"
        },
        "max_score": 1,
        "hits": [
          {
            "_index": "test-range",
            "_id": "zfeT944BZFU3azzCOTSB",
            "_score": 1,
            "_source": {
              "message": "ac"
            }
          },
          {
            "_index": "test-range",
            "_id": "OF6T944BMvvuJ06pQGtI",
            "_score": 1,
            "_source": {
              "message": "ad"
            }
          },
          {
            "_index": "test-range",
            "_id": "3feT944BZFU3azzCjDQP",
            "_score": 1,
            "_source": {
              "message": "ac1"
            }
          }
        ]
      }
    }
    

    Now lets perform range query on number (keyword type) -

    GET test-range/_search
    {
      "query": {
        "range": {
          "message": {
            "gte": 17,
            "lte": 23
          }
        }
      }
    }
    

    Response

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": {
          "value": 3,
          "relation": "eq"
        },
        "max_score": 1,
        "hits": [
          {
            "_index": "test-range",
            "_id": "OfeR944BZFU3azzCYDQ6",
            "_score": 1,
            "_source": {
              "message": 17
            }
          },
          {
            "_index": "test-range",
            "_id": "pV6R944BMvvuJ06paWrr",
            "_score": 1,
            "_source": {
              "message": 2
            }
          },
          {
            "_index": "test-range",
            "_id": "S_eR944BZFU3azzCdDSd",
            "_score": 1,
            "_source": {
              "message": 23
            }
          }
        ]
      }
    }
    

    Here it is trying to sort alphabetically which is not possible on Numeric value. So it is showing in same order as you indexed. For example if i will try to find range between 2 - 23 it will not return 17.

    Solution

    Use Numeric field type to perform range query on numbers and for dates you can use date type. Adding proper field type should solve your problem.