Search code examples
elasticsearchelasticsearch-java-api

How to delete data from ElasticSearch through JavaAPI


EDITED I'm trying to find out how to delete data from Elasticsearch according to a criteria. I know that older versions of ElasticSearch had Delete By Query feature, but it had really serious performance issues, so it was removed. I know also for that there is a Java plugin for delete by query:

org.elasticsearch.plugin:delete-by-query:2.2.0

But I don't know if it has a better implementation of delete which has a better performance or it's the same as the old one.

Also, someone suggested using scroll to remove data, but I know how to retrieve data scrolling, not how to use scroll to remove!

Does anyone have an idea (the amount of documents to remove in a call would be huge, over 50k documents.

Thanks in advance!

Finally used this guy's third option


Solution

  • You are correct that you want to use the scroll/scan. Here are the steps:

    1. begin a new scroll/scan
    2. Get next N records
    3. Take the IDs from each record and do a BulkDelete of those IDs
    4. go back to step 2

    So you don't delete exactly using the scroll/scan, you just use that as a tool to get all the IDs for the records that you want to delete. In this way you're only deleting N records at a time and not all 50,000 in 1 chunk (which would cause you all kinds of problems).