Search code examples
searchelasticsearchdeduplication

How do I check for duplicate data on ElasticSearch?


When storing some documents, it should store the nonexistent and ignore the rest (should this be done at application level, maybe checking if document's id already exists, etc.?)


Solution

  • Here is what is stated in documentation:

    Operation Type

    The index operation also accepts an op_type that can be used to force a create operation, allowing for “put-if-absent” behavior. When create is used, the index operation will fail if a document by that id already exists in the index.

    Here is an example of using the op_type parameter:

    $ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elastic Search"
    }'
    

    Another option to specify create is to use the following uri:

    $ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elastic Search"
    }'