Search code examples
pythonelasticsearchaws-lambdaaws-elasticsearch

How to get the number of shards being used in the cluster - Python


I am creating a lambda function which will hold a python script. The purpose of the lambda function is to alert if shards usage in a cluster is greater than 90% i.e. 4500/5000 shards being used. I am using an ElasticSearch client for python so I want to know if there's any methods that allow you to calculate open shards capacity. Thanks in advance


Solution

  • you can use this Elasticsearch library and run something along the lines of:

    es = Elasticsearch()
    stats = es.cluster.stats()
    shards = stats['indices']['shards']['total']
    

    It uses this stats method to get the clusters' stats, from there you can get all the info you need about number of shards across all indices.

    If you want to calculate the max threshold of shards and notify based on some % of usage, find your max number of shards/node, get the number of nodes from stats, calculate max threshold, and check if you need to notify with something like that notify = shards/max > 0.9.

    EDIT:

    Here's a more complete code example:

    threshold = 0.9
    cluster_stats = es.cluster.stats()
    cluster_settings = es.cluster.get_settings()
    total_shards = cluster_stats['indices']['shards']['total']
    data_nodes = cluster_stats['nodes']['count']['data']
    max_shards_node = int(cluster_settings['persistent']['cluster']['max_shards_per_node'])
    # Calculate if the current amount of shards is approaching the threshold
    notify = total_shards/(data_nodes * max_shards_node) > threshold
    if notify:
        # you choose how to handle notification