Search code examples
elasticsearchindexingbulk

Elastic - Bulk upload: index x update


I have 2 distinguished bulk uploads to perform, and the sequence that each will happen is completely unpredictable In one load I would have the fields: SERVER_NAME, OS, and PROD_1_VERSION In the other one, I would have the fields: SERVER_NAME, OS, and PROD_2_VERSION

My files look like this:

{"index":{"_index" : "myindex", "_id" : "MY_SERVER_1" }}
{"SERVER_NAME":"MY_SERVER_1","OS":"Ubuntu","PROD_1_VERSION":"1.0.0.5" }
{"index":{"_index" : "myindex", "_id" : "MY_SERVER_2" }}
{"SERVER_NAME":"MY_SERVER_2","OS":"Windows10","PROD_1_VERSION":"2.0.0.0" }
{"index":{"_index" : "myindex", "_id" : "MY_SERVER_3" }}
{"SERVER_NAME":"MY_SERVER_3","OS":"Fedora","PROD_1_VERSION":"2.5.0.1" }

and:

{"index":{"_index" : "myindex", "_id" : "MY_SERVER_1" }}   
{"SERVER_NAME":"MY_SERVER_1","OS":"Ubuntu","PROD_2_VERSION":"6.0.0.5" } 
{"index":{"_index" : "myindex", "_id" : "MY_SERVER_2" }}
{"SERVER_NAME":"MY_SERVER_2","OS":"Windows10","PROD_2_VERSION":"7.0.0.0" } 
{"index":{"_index" : "myindex", "_id" : "MY_SERVER_3" }}
{"SERVER_NAME":"MY_SERVER_3","OS":"Fedora","PROD_2_VERSION":"8.5.0.1" }
  • If I do the loads in the given sequence, and using "index" the property "PROD_2_VERSION" will be added, but "PROD_1_VERSION" will be lost
  • If I modify it, and use "update" rather then "index" (including { "doc" : ... } before the properties ), the first load fails, as it tries to update something that does not exist yet
  • If the first load has "index" and the second has "update" it works, however, as mentioned, the sequence that each will happen can't be controlled.

Is there a way to make it works like this:

if record exit, 
   use behave like 'index'
else
   behave like 'update'

???


Solution

  • I'm not sure to totally understand your use case. But to do an "upsert" (insert or update) in a bulk into elastic search you must add

    "doc_as_upsert" : true 
    

    After your doc part.

    Here is the example of the official elasticsearch's documentation:

    { "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
    { "doc" : {"field" : "value"}, "doc_as_upsert" : true }