Rolling Upgrade
Following the documentation here, I upgraded an elasticsearch cluster from version 5.4 to version 5.6.
This process caused the disk usage on each upgraded node to double.
The following documentation made me assume that this problem would be rectified once the entire cluster was upgraded:
During a rolling upgrade the cluster will continue to operate as normal. Any new functionality will be disabled or work in a backward compatible manner until all nodes of the cluster have been upgraded. Once the upgrade is completed and all nodes are on the new version, the new functionality will become operational. Once that has happened, it is practically impossible to go back to operating in a backward compatible mode. To protect against such a scenario, nodes from the previous major version (e.g. 5.x) will not be allowed to join a cluster where all nodes are of a higher major version (e.g. 6.x).
Disk Usage Doubled
However, even once every node in the cluster was the same version (5.6.16
), each node that held previous data continues to hold onto that data, and the cluster does not clean it up. This means the disk usage has effectively doubled, and I've had to modify the high/low watermark
values simply to allow cluster operation to continue.
I do not understand the internals of elasticsearch's data well enough to know what directories need to be deleted and which ones should be kept. Perhaps there is even a built-in way to force the cleanup.
Small example from the data directory:
$ ls nodes/0/indices/
-AW6zio6TuynJ_KQlEooVA 66O4EMc0R3KYclO50uRQ1g CuiQlU_dTDOVkJbV9oQIGw J_YG4HlBRYeBWp0wc0L_Nw Qw-3eYh6TlGpHCys2GBdwg YRrQono1QCWxXahJIT0hfg eK0di6WSRnumTUHiqAQ3gw m3EDPIOqS9mx22k6hQH2yA umpdodA2QR6de8VcuR3qag
-ItvQ5StRECde2zvdV-Ucg 6FAopyspSLu8NGEUekOwhg CzqDKml3QCG16g0zxgnG7g JgbfkCt5RDGmpFowxnYIiA QwerlX68SaqhEzg-Ml3i0Q YcbvcSuxQtaNIXcxU6mpJQ eOww30E7R96ymTqhQyNYng mAspUVrETLuAP6zapD8IVQ uoMcWwmfTeCTKXchAyVt-w
-K59oANFSmmopPt2r5yjYg 6GWAf6ITT4e_9HNwjYlk3g D3Co7Ht1SROlEGCHcSn67Q JrSXkDEETfS8XHe-PH-9qw RB6LxG9uT_eW8Z28Zh4b9A YjHJMVZjRK-8Coya8eBMOA eVtK6_HrTA-1yAfDEnKZnw mG_NCeR3RhSQO6tLRYmJGg upHl_Bu7R0eFZUxU5qrDrA
-WevSR0jRZKTz7CH5LWKOQ 6L6MDgW0QCWLn0lr6NwRUA DEB0-vP7TMmyBK8M18sJ2A JtXS6yJPQwGKhC0qAulNBQ RKcX1apNTsyod54oLYnJ0A YuLmawshTn-WCPPD8Hs8YA ecrbXDCdSleo6Y2_p6SDeg mHOr6_WMT4ODxBGh1e5MCw uq9BlreyTk-xXM-HTsmesw
-jjL_BjFTFycO83wVW4L6Q 6LD31skNSbGVgscF784PnQ DG1ESvHdS1y8AzbbqhML6Q Ju6ks-W5Q4yX0GggfO3hQw RKcvj2kwRe6OBspnZBFrjA Yuu9nCSfTjCqEwcznS1Oqw eq6QwBMaTI2fik81gyD6gQ mKXR0uWtTjenFFkq0GVP8g utoyyWn3SY23rKrg8sCwpg
-t4M8dc9TZiKYZI7Mia8hQ 6Rw8yFOhSvqveDoWf19F5A DOJaKVahTvm7G79RIfpGhQ K-a5KU8hT-WSQw1cPAWXhw RPhKOIYNRoKQYHPauPpYzQ Z0GgoShfR2iGidFa-fXhzQ f8qpQPOARdeqHcXH3OFBqQ mM_43p8mRsOCosUH2C3iUg uv40fHgkQtCFShozCAmtMQ
04JbWXE4S-66wTVQZ6587Q 6XjX8cP1QEuCxalGCmq9bQ Dc-lhr15Qz6sCEdw4smRGQ KAHIxqC2Sm-8Cu-fo4P54A Rd6gkNVkTxitNvGPtmJ8jw Z6c02QTLRz6nrfVEjMQr3g fPJyBMlVQQ2j5oyvyYQKNw mUioLd-hTq2CbRpQ6BMfxw vC3erzIcT1Ked9vGmCGRFw
How can I clean up the old shards created by version 5.4?
I was unable to determine a root cause or mechanism to handle the doubling of the disk usage, and so instead I re-indexed the entire cluster into a new 7.x cluster over the course of about a month. Once complete, I removed the old cluster from the servers it occupied and added those servers to the new cluster, which automatically set about rebalancing things.
This took quite some time but ended up being an effective route. I did not see unusual disk usage by doing this.