Search code examples
elasticsearchelasticsearch-7

How to limit max number of ElasticSearch documents in a index?


I've installed an Elastic Search (version 7.x) cluster and created a new index. I want to limit the maximum number of documents in this index. Let's say 10000 documents top.

The naive solution is to query the number of documents before inserting a new document into it. But this method can be not accurate and also have poor performances (2 requests...).

How to do it right?


Solution

  • The best practice is to use Index Life Management which is in the Basic License and enabled by default in Elastic v7.3+

    You can set a rollover action on the number of document (i put 5 max docs) :

    PUT _ilm/policy/my_policy
    {
      "policy": {
        "phases": {
          "hot": {
            "actions": {
              "rollover": {
                "max_docs": 5
              }
            }
          }
        }
      }
    }
    

    Now i create a template with the policy my_policy :

    PUT _template/my_template
    {
      "index_patterns": [
        "my-index*"
      ],
      "settings": {
        "index.blocks.read_only" : true,
        "index.lifecycle.name": "my_policy",
        "index.lifecycle.rollover_alias": "my-index"
      }
    }
    

    Note that i put the setting "index.blocks.read_only" : true because when the rollover will be applied it will create a new index with read_only parameter.

    Now i can create my index :

    PUT my-index-000001
    {
      "settings": {
        "index.blocks.read_only": false
      },
      "aliases": {
        "my-index": {
          "is_write_index": true
        }
      }
    }
    

    That's it ! After 5 documents, it will create a new read only index and the alias will be on writing on this one.

    You can test by index some new docs with the alias :

    PUT my-index/_doc/1
    {
      "field" : "value"
    }
    

    Also, by default the ilm policy will be applied every 10 minutes, you can change that in order to test with :

    PUT /_cluster/settings
    {
      "persistent": {
        "indices.lifecycle.poll_interval": "5s"
      }
    }