I am trying to index a document if it doesn't already exist in elasticsearch. I am using BulkProcessor
when indexing my documents and using Requests.add
action. I will have the exact same id sometimes, does it not add automatically, but update?
P.S. Update is not a requirement, it can stay as is.
P.S.2 I am trying to integrate a user's past tweets into elasticsearch-twitter-river's user stream.
If you index a doc with the same document id then it will do an update. Otherwise it will add a new document.
In other words, if you PUT
a doc to {index}/{type}/{id}
, then it will always update (overwrite) the document with that id. If you POST
a doc to {index}/{type}
then in general Elasticsearch will generate a new document for each of your POSTs. That is, unless you mapped a document field to the _id
field in mappings.
It seems that the Twitter River uses the PUT method with explicitly specifying the id so tweets with the same id will probably be overwritten.