Why does Elasticsearch sum score become inaccurate with big numbers?

If I have an index that looks something like this:

[{:index=>
   {:_index=>"candidates",
    :_id=>"a1786607-e095-4621-bdf9-de2706475614",
    :data=>
     {:name=>"Carli Stark",
      :is_verified=>true, :has_work_permit=>true}}},
 {:index=>
   {:_index=>"candidates",
    :_id=>"57f78d3f-392e-4cdf-a5ff-6d10e7c89d5b",
    :data=>
     {:name=>"Gayla Keeling",
      :is_verified=>false, :has_work_permit=>true}}}]

And I make a query with score_mode sum and boost_mode replace (as I want only my relevance score to be considered):

GET candidates/_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "filter": {
            "term": {
              "is_verified": true
            }
          },
          "weight": 1000
        },
        {
          "filter": {
            "term": {
              "has_work_permit": true
            }
          },
          "weight": 100000000000
        }
      ],
      "score_mode": "sum",
      "boost_mode": "replace"
    }
  },
  "_source": ["is_verified"],
  "size": 50
}

Then why does Elasticsearch return exactly the same score for both documents? (Also notice that the order is wrong)

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 9.9999998E10,
    "hits" : [
      {
        "_index" : "candidates_production_20240828114811152",
        "_type" : "_doc",
        "_id" : "cbd1b70b-f889-4136-a43e-f6782955f58e",
        "_score" : 9.9999998E10,
        "_source" : {
          "is_verified" : false,
          "has_work_permit" : true
        }
      },
      {
        "_index" : "candidates_production_20240828114811152",
        "_type" : "_doc",
        "_id" : "d644a5e5-09e0-496e-8830-c1a772c46611",
        "_score" : 9.9999998E10,
        "_source" : {
          "is_verified" : true
          "has_work_permit" : true
        }
      }
    ]
  }
}

If I use a bigger weight (such as 10000 instead of 1000) then the scores are different as expected:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.00000006E11,
    "hits" : [
      {
        "_index" : "candidates_production_20240828114811152",
        "_type" : "_doc",
        "_id" : "d644a5e5-09e0-496e-8830-c1a772c46611",
        "_score" : 1.00000006E11,
        "_source" : {
          "is_verified" : true,
          "has_work_permit" : true
        }
      },
      {
        "_index" : "candidates_production_20240828114811152",
        "_type" : "_doc",
        "_id" : "cbd1b70b-f889-4136-a43e-f6782955f58e",
        "_score" : 9.9999998E10,
        "_source" : {
          "is_verified" : false
          "has_work_permit" : true
        }
      }
    ]
  }
}

But how do I make it accurate? I need the smaller weights to be also considered in the scores regardless of how big the score is.

My Elasticsearch version is 7.10.1 (AWS ES)

Solution

This is due to the fact that the score is stored into a float, which can only represent integers accurately up to 2^24.

Relevant issue here.

My solution was to apply the sqrt function to the weights to keep them with smaller numbers but similar distributions.