Elasticsearch version: 7.10.0
I have an elasticsearch index with 8 shards in 8 different nodes and with a document count greater than 25 million documents(nested not included). It's an heavy update index. The document size grows over a period of time because of deleted documents. I did a search on this issue and read blogs like one below which tells a segment will automatically be merged when the deleted docs count in that segment is greater than 50%.
https://discuss.elastic.co/t/too-many-deleted-docs/84964/4
I did a /_segments for the index and found segments like the below
"segments": {
"_bbx": {
"generation": 14685,
"num_docs": 27901732,
"deleted_docs": 23290932,
"size_in_bytes": 5071187083,
"memory_in_bytes": 137008,
"committed": true,
"search": true,
"version": "8.7.0",
"compound": false,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_SPEED"
}
},
Full response of /_segment call can be found here
https://drive.google.com/file/d/1mLE2xw0u7lnogHnfzz65rWCBS8JrcnNm/view?usp=sharing
In many segments like the one above the deleted_docs count is more than 75% of the num_docs but is still not getting merged. We haven't set any max_merged_segment so the default is 5gb. We also haven't changed any MergePolicy and are using the default ones as of Es version 7.10.0.
Is my understanding correct ?
Any thoughts on this would be helpful. Thanks in advance.
The num_docs contains only the present documents and doesn't include the deleted documents.
So in this case we have 23,290,932 deleted documents out of a total of 51,192,664 (27,901,732 + 23,290,932) documents which means 45.5% are deleted in that segment. Hence segment merge didn't happen.
Note : Posted the same question in elasticsearch forums got this reply https://discuss.elastic.co/t/elasticsearch-segment-merge-not-happening-when-deleted-documents-count-is-greater-than-50/277209