I have written af stored procedure for adding a Type property to all documents in a DocumentDB collection. Unfortunately the stored procedure fails after updating just one document. The collection contains around 5000 documents.
Here is the stored procedure:
function updateSproc() {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var responseBody = {
updated: 0,
continuation: true,
error: "",
log: ""
};
// Validate input.
tryQueryAndUpdate();
// Recursively queries for a document by id w/ support for continuation tokens.
// Calls tryUpdate(document) as soon as the query returns a document.
function tryQueryAndUpdate(continuation) {
var query = { query: "SELECT * FROM root c WHERE NOT is_defined(c.Type)", parameters: []};
var requestOptions = { continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function(err, documents, responseOptions) {
if (err) {
responseBody.error = err;
throw err;
}
if (documents.length > 0) {
// If documents are found, update them.
responseBody.log += "Found documents: " + documents.length;
tryUpdate(documents);
} else if (responseOptions.continuation) {
responseBody.log += "Continue query";
tryQueryAndUpdate(responseOptions.continuation);
} else {
responseBody.log += "No more documents";
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - throw an exception.
if (!isAccepted) {
responseBody.log += "Query not accepted";
response.setBody(responseBody);
}
}
// Updates the supplied document according to the update object passed in to the sproc.
function tryUpdate(documents)
{
if (documents.length > 0) {
responseBody.log += "Updating documents " + documents.length;
var document = documents[0];
// DocumentDB supports optimistic concurrency control via HTTP ETag.
var requestOptions = { etag: document._etag};
document.Type="Type value";
// Update the document.
var isAccepted = collection.replaceDocument(document._self, document, requestOptions, function(err, updatedDocument, responseOptions) {
if (err) {
responseBody.error = err;
throw err;
}
responseBody.updated++;
documents.shift();
tryUpdate(documents);
});
// If we hit execution bounds - throw an exception.
if (!isAccepted) {
responseBody.log += "Update not accepted";
response.setBody(responseBody);
}
} else {
tryQueryAndUpdate();
}
}}
Based on the response returned I can see that the query returns 100 documents. tryUpdate is called twice but the second call to replaceDocument is not accepted. Why is it not accepted when there are many documents to update?
As per my answer on to the same question MSDN
Yes, 700RUs + (estimated) 20RUs per insert, on a collection that only allows 250RUs per sec is going to be a problem. The query is 700RUs because you're doing a NOT operation, which is effectively a scan because that can't be indexed.
So some things to try;
1) Change the logic to exclude the NOT is_defined check and perhaps Order By _ts DESC to get the docs that were updated last first. That might be cheaper than doing the NOT check. Then you could check each doc you got if it had a Type property already, if not add one and ReplaceDocument
2) You could also try scaling the collection up to an S3 while you are doing this operation, and then scale it back down to an S1 again. That will give you 2500 RUs to play with.
3) Even using an S3, you might still run in to this, it might just happen after more docs than the 2nd one.
So, to fix I would execute a query in an app to return just the id of records that didn't have the property defined,
SELECT VALUE c.id FROM c WHERE NOT is_defined(c.Type)
Stick those ids in to a list / array of some sort and then .Take() items from the list and pass to sproc as an array. Now have the sproc loop through the passed array doing a ReadDocument by id, update and replace and increment counter.
When isAccepted returns false, set the response body to the value of the counter and return to calling code. Now the calling code can then Skip(counter).Take(x) and call the sproc again.
Take a look at this sample for an example of how to do bulk insert via a stored proc. This shows how to batch records, exec a sproc, and get the current position the sproc got to in that batch before isAccepted == false from the response body.