I'm scraping a large set of items using node.js/request and mapping the fields to ElasticSearch documents. The original documents have an ID field which never changes:
{ id: 123456 }
Periodically, I'd like to "refresh" and see which original items are no longer available, for whatever reason. Currently, I have a script which scrapes directly and simply inserts into Elastic.
Is there a way to check if an item with the same ID already exists before doing an insert? I don't want to end up with a ton of duplicates.
Are you using your ID as the document _id
? Then it should be easy by using the operation type where you can specify that a document with a specific ID should only be created, but not overwritten:
PUT your-index/your-type/123456/_create
{
"foo" : "bar",
}