Search code examples
pythonelasticsearchpyespyelasticsearch

Elasticsearch python API: Delete documents by query


I see that the following API will do delete by query in Elasticsearch - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

But I want to do the same with the elastic search bulk API, even though I could use bulk to upload docs using

es.bulk(body=json_batch)

I am not sure how to invoke delete by query using the python bulk API for Elastic search.


Solution

  • Seeing as how elasticsearch has deprecated the delete by query API. I created this python script using the bindings to do the same thing. First thing define an ES connection:

    import elasticsearch
    es = elasticsearch.Elasticsearch(['localhost'])
    

    Now you can use that to create a query for results you want to delete.

    search=es.search(
        q='The Query to ES.',
        index="*logstash-*",
        size=10,
        search_type="scan",
        scroll='5m',
    )
    

    Now you can scroll that query in a loop. Generate our request while we do it.

     while True:
        try: 
          # Git the next page of results. 
          scroll=es.scroll( scroll_id=search['_scroll_id'], scroll='5m', )
        # Since scroll throws an error catch it and break the loop. 
        except elasticsearch.exceptions.NotFoundError: 
          break 
        # We have results initialize the bulk variable. 
        bulk = ""
        for result in scroll['hits']['hits']:
          bulk = bulk + '{ "delete" : { "_index" : "' + str(result['_index']) + '", "_type" : "' + str(result['_type']) + '", "_id" : "' + str(result['_id']) + '" } }\n'
        # Finally do the deleting. 
        es.bulk( body=bulk )
    

    To use the bulk api you need to ensure two things:

    1. The document is identified You want to update. (index, type, id)
    2. Each request is terminated with a newline or /n.