Search code examples
elasticsearchelasticsearch-painless

" version conflict, current version is different than the one provided" when running update_by_query curl in php script


I have to update some fields in my ES documents.

I have an interger 'objectID' field, which is an unique id of the object concerned by the document.

I have a String 'objectType' field, which is the type of object concerned by the document.

All documents describe an action on the object and the objectType and objecID are always present in all documents.

Unfortunately, some documents with the objectType "post_image" have been indexed as "post". The objectID is still unique and valid and only a single type of documents have the wrong objectType. Therefore, all objects have at least another document with the right objectType and the same unique objectID.

I want to use an update_by_query to update the value of the objectType to "post_image" on all documents where the objectType is "post" and the objectID is in any other document where the objectType is "post_image".

Here's my pseudo-code script:

{
"query": {
    "match" : { "objectType" : "post" } //all documents with objectType post
},
"script": {
    "lang": "painless",
  "source": "
//subquery selecting all objectIDs from documents with objectType "post_image"
    subQueryResults = "query": {
        "match" : { "objectType" : "post_image" }
        //I don't know to filter results to retrive objectID field only
        //no need for help here, i'll figure it out myself
    }
    if (/*ctx.source['objectID'] in subQueryResults*/){
        ctx._source['objectType'] = "post_image"
    }

  "
}

I'm new to painless script and i have no idea how to put another query inside my script to get a list of all "post_image" ids. I know i can pass parameters to a script but i don't know if or how i can use a query result in that either.

Thanks!

EDIT:

I've solved part of my problem by extracting a csv list of concerned objectID with Kibana raw export and i've made a PHP script to parse each objectID and put it in my query string for my update_by_query which simply finds ALL document with matching objectID and replace the objectType field value to "post_image".

i'm using php curl to make these call and i have version conflict issues despite using "conflicts" : "proceed" in my request. I've tested the very same query in the dev console in kibana and it works perfectly and i couldn't find any explanation to why it dosen't update my documents when running from php.

Here's the script:

<?php
$query = "";
$csvFile = file($argv[1]);
try{
        //$data = array();
    $query = "";
    $i = 0;
    $csv_headers = array();

    $uri = "http://ip/index/type/_update_by_query";

    $conn = curl_init();
    curl_setopt($conn, CURLOPT_URL, $uri);
    curl_setopt($conn, CURLOPT_TIMEOUT, 5);
    curl_setopt($conn, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($conn, CURLOPT_SSL_VERIFYHOST, FALSE);
    curl_setopt($conn, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($conn, CURLOPT_FAILONERROR, FALSE);
    curl_setopt($conn, CURLOPT_CUSTOMREQUEST, strtoupper('POST'));
    curl_setopt($conn, CURLOPT_FORBID_REUSE, 0);

    foreach ($csvFile as $line) {
        try{    
            //WARNING: separator parameter of str_getcsv call is a risk or error based on the type of CSV used. 
            //skip header in CSV
            if ($i > 0){
                $data = str_getcsv($line,',');
                    //$data = explode(",", $line);
                $id = $data[0];
                echo $id.", ";
            //old query, wasn't working
            //     $query = "{
            //         \"conflicts\": \"proceed\",
            //         \"query\": {
            //             \"match\" : { \"objectID\" : ".$id."
            //         }
            //     },
            //     \"script\": {
            //         \"lang\": \"painless\",
            //         \"source\": \"ctx._source['objectType'] = '".$argv[2]."'\"
            //     }
            // }";
                $query = "{
                    \"conflicts\": \"proceed\",
                    \"query\": {
                       \"bool\": {
                        \"must\": {
                            \"match\": {
                                \"objectType\": \"Post\"
                            }
                        },
                        \"filter\": {
                            \"terms\": {
                                \"objectID\": [
                                    ".$id."
                                ]
                            }
                        }
                    }
                },
                \"script\": {
                    \"lang\": \"painless\",
                    \"source\": \"ctx._source['objectType'] = 'Post_image'\"
                }
            }";

            curl_setopt($conn, CURLOPT_HTTPHEADER, array(
                'Content-Type: application/json',
                'Content-Length: ' . strlen($query))
        );
            curl_setopt($conn, CURLOPT_POSTFIELDS, json_encode($query));
            $response = curl_exec($conn);
            //sleep(1);
            echo $response;
        }
        $i++;
    }catch(Exception $e){
        echo $e->getMessage();
            //continue;
    }
}catch(Exception $e){
echo $e->getMessage();
}
}
echo $query;
echo "\nCompleted.\n\n";
?>

example response:

{"index":"index",
"type":"type",
"id":"AWB0YFcjAFB9uQAwMSKx",
"cause":{"type":"version_conflict_engine_exception",
"reason":"[type][AWB0YFcjAFB9uQAwMSKx]: version conflict,
 current version [27] is different than the one provided [26]",
"index_uuid":"yOD9SBy0RMmDZGK_N5o8qw",
"shard":"2",
"index":"index"},
"status":409}

It is pretty weird since i'm not giving any document version in my request. Parhaps it has something to do with some automatic internal behaviour from the upbade_by_query API.


Solution

  • I fixed the whole think finally.

    First off, i reworked my query a bit:

    $query = "{ \"query\": {
                           \"bool\": {
                            \"must\": {
                                \"match\": {
                                    \"objectType\": \"Post\" <- more optimal!
                                }
                            },
                            \"filter\": {
                                \"term\": {
                                    \"objectID\":
                                        \"".$id."\"
                                }
                            }
                        }
                    },
                    \"script\": {
                        \"lang\": \"painless\",
                        \"source\": \"ctx._source['content'] = '".$argv[2]."'\"
                    }
                }";
    

    argv[2] is the objectType i want to give to my documents. ("Post_image")

    Then, i had to remove the JSON_encode($query) on the line before the curl_exec

    curl_setopt($conn, CURLOPT_POSTFIELDS, $query);
            $response = curl_exec($conn);
    

    Then i stopped having error BUT i had lots of empty results which was weird because the query was returning results when using kibana dev tools but then i realised i was using the wrong IP and was sending everything to another up and running test ES which had the same index/types but without any actual documents in the index, hence the empty results without actual errors. I felt a little dumb.

    PS: FEATURE REQUEST: facepalm emoji.