Search code examples
twitterelasticsearchtwitter4jtwitter-streaming-api

Index if not exists using bulk processor in elasticsearch


I am trying to index a document if it doesn't already exist in elasticsearch. I am using BulkProcessor when indexing my documents and using Requests.add action. I will have the exact same id sometimes, does it not add automatically, but update?

P.S. Update is not a requirement, it can stay as is.

P.S.2 I am trying to integrate a user's past tweets into elasticsearch-twitter-river's user stream.


Solution

  • If you index a doc with the same document id then it will do an update. Otherwise it will add a new document.

    In other words, if you PUT a doc to {index}/{type}/{id}, then it will always update (overwrite) the document with that id. If you POST a doc to {index}/{type} then in general Elasticsearch will generate a new document for each of your POSTs. That is, unless you mapped a document field to the _id field in mappings.

    It seems that the Twitter River uses the PUT method with explicitly specifying the id so tweets with the same id will probably be overwritten.