Search code examples
elasticsearchlucene

Difference in elasticsearch index size with same data and number of documents


I have multiple elasticsearch clusters, every cluster has the same indices with the same data with the same number of documents. But there is a significant difference in the index size. I tried to use merge api but it's not helping. The issue is, because of this elasticsearch is eventually running out of space:

{
    "state": "UNASSIGNED",
    "primary": true,
    "node": null,
    "relocating_node": null,
    "shard": 3,
    "index": "local-deals-1624295772015",
    "recovery_source":
    {
        "type": "EXISTING_STORE"
    },
    "unassigned_info":
    {
        "reason": "ALLOCATION_FAILED",
        "at": "2021-08-18T19:14:20.472Z",
        "failed_attempts": 20,
        "delayed": false,
        "details": "shard failure, reason [lucene commit failed], failure IOException[No space left on device]",
        "allocation_status": "deciders_no"
    }
}

I have configured the elasticsearch cluster to not have more than 2 shards per node to improve the query performance.

Cluster-1: enter image description here

Cluster-2: enter image description here

Given these two clusters with the same documents, there is a difference of 90% in the index size which is not making sense to me. Can someone explain this behavior?

My quick fix is to increase the EBS volume.

Response to @Val's question: There are multiple documents that are marked for deletion.

"5": {
    "health": "yellow",
    "status": "open",
    "index": "local-deals-1624295772015",
    "uuid": "s7QDLtuhRN6HM_VwtVTB0Q",
    "pri": "6",
    "rep": "1",
    "docs.count": "8911560",
    "docs.deleted": "18826270",
    "store.size": "37gb",
    "pri.store.size": "19.9gb"
}

Solution

  • You can try to run _forcemerge indeed. It is not a blocking call, it triggers an asynchronous task that will run in the background until the job is done. You don't need to wait for the call to return in order to force merge segments.

    Also know that this will not remove all deleted documents, but a good deal of them depending on the ratio deleted/docs.

    You can find more info on the different merge settings in the MergePolicyConfig.java class.