Search code examples
elasticsearchelasticsearch-mappingelasticsearch-model

Elasticsearch - implications of splitting documents into separate indexes


Let's say I have 100,000 documents from different customer groups, which are formatted the same with the same type of information.

Documents from individual customer groups get refreshed at different times of the day. I've been recommended to give each customer group their own index so when my individual customer index is refreshed locally I can create a new index for that customer and delete the old index for that customer.

What are the implications for splitting the data into multiple indexes and querying using an alias? Specifically:

  • Will it increase my server HDD requirements?
  • Will it increase my server RAM requirements?
  • Will elasticsearch be slower to search by querying the alias to query all the indexes?

Thank you for any help or advice.


Solution

  • Every index has some overhead on all levels but it's usually small. For 100,000 documents I would question the need for splitting unless these documents are very large. In general each added index will:

    1. Require some amount of RAM for insert buffers and other per-index related tasks

    2. Have it's own merge overhead on disk relative to a larger single index

    3. Provide some latency increase at query time due to result merging if a query spans multiple indexes

    There are a lot of factors that go into determining if any of these are significant. If you have lots of RAM and several CPUs and SSDs then you may be fine.

    I would advise you to build a solution that uses the minimum number of shards as possible. That probably means one (or at least only a few) index(es).