Search code examples
elasticsearchtypesfield

Elasticsearch “data”: { “type”: “float” } query returns incorrect results


I have a query like below and when date_partition field is "type" => "float" it returns queries like 20220109, 20220108, 20220107. When field "type" => "long", it only returns 20220109 query. Which is what I want.

Each queries below, the result is returned as if the query 20220119 was sent. --> 20220109, 20220108, 20220107

PUT date
{
  "mappings": {
    "properties": {
      "date_partition_float": {
        "type": "float"
      },
      "date_partition_long": {
        "type": "long"
      }
    }
  }
}
POST date/_doc
{
  "date_partition_float": "20220109",
  "date_partition_long": "20220109"
}
#its return the query
GET date/_search
{
  "query": {
    "match": {
      "date_partition_float": "20220108"
    }
  }
}
#nothing return
GET date/_search
{
  "query": {
    "match": {
      "date_partition_long": "20220108"
    }
  }
}

Is this a bug or is this how float type works ? 2 years of data loaded to Elasticsearch (like day-1, day-2) (20 gb pri shard size per day)(total 15 TB) what is the best way to change the type of just this field ? I have 5 float type in my mapping, what is the fastest way to change all of them. Note: In my mind I have below solutions but I'm afraid it's slow

  • update by query API
  • reindex API
  • run time search request (especially this one) Thank you! enter image description here

Solution

  • Here is the answer to my question => https://discuss.elastic.co/t/elasticsearch-data-type-float-returns-incorrect-results/300335

    You're running into some java quirks (built as intended however) here. If you want to reproduce, run jshell locally and type in this

    Float.valueOf(20220109.0f); the result will return 2.0220108E7 due to rounding issues with floating point values, as they are not stored exactly.

    You can use the reindex functionality to reindex your data into an index with the mapping fixed (you could also add new fields to the existing index and use update-by-query, but I am not sure that is clean).