Search code examples
databaseelasticsearchmergeclonereindex

Elasticsearch merge data from multiple indexes into merged index


My company uses an out of the box software, and that software export logs to Elasticsearch (and uses these logs). The software create an index per day for every data type, for example: "A" record data => A_Data_2022_12_13, A_Data_2022_12_14 and so on.. Because this data storing method our Elastic has thousands of shards for 100GB of data. I want to merge all those shards into a small amount of shards, 1 or 2 for every data type.

I thought about reindex, but I think it is overkill for my purpose, because I want the data to stay the same as it is now, but merged into one shard.

What is the best practice to do it? Thanks!

I tried reindex, but it takes a lot of time, and I think it is not the right solution.


Solution

  • Too many shards can cause over-heap usage. Unbalanced shards can cause hot spots in clusters. Your decision is true and you should combine small indices into one or multiple indexes. Thus, you will have more stable shards, that is, a more stable cluster.

    What you can do?

    1. Create a rollover index and point your indexer to that index. In that way, new data will store in the new index, so you need only be concerned about the existing data.
    2. Use filtered alias to search your data.
    3. Reindex or wait. The new data is indexing into a new index, but what are you gonna do for the existing indices? There are 2 ways for this. I assume you have an index retention period, so you can wait until all separated indices are deleted or you can directly reindex your data.

    Note: You can tune the reindex speed with slice and set the number_of_replicas to 0.

    enter image description here