Search code examples
xquerymarklogic

Marklogic commit frame/return sequence guarantee


I have a simple 1 node Marklogic server that I need to purge documents daily.

The test query below selects the documents then returns a sequence which I want to do the following:

  1. output the name of the file being extracted
  2. ensure the directory path exists of file in #1
  3. save a zipped version of the document to the file in #1.
  4. Delete the document

Is this structure safe? It returns a sequence for each document to be deleted. The last item in the returned sequence deletes the document. If any of the prior steps fail, will the document still be deleted? Should I trust the engine to execute the return sequence in order given?

xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
let $dateLimitAll := current-dateTime() -xs:dayTimeDuration("P1460D")
let $dateLimitSome := current-dateTime() -xs:dayTimeDuration("P730D")
for $adoc in doc()[1 to 5]
 let $docDate := $adoc/Unit/created
 let $uri := document-uri($adoc)
 let $path:=  fn:concat("d:/purge/" , $adoc/Unit/xmldatastore/state/data(), "/", fn:year-from-dateTime($docDate), "/", fn:month-from-dateTime($docDate))
 let $filename :=  fn:concat($path, "/", $uri, ".zip")
where ( ($docDate < $dateLimitAll) or (($docDate < $dateLimitSome) and ($adoc/Unit/xmldatastore/state != "FIRMED") and ($adoc/Unit/xmldatastore/state != "SHIPPED")))
return ( $filename, xdmp:filesystem-directory-create($path, map:new(map:entry("createParents", fn:true()))), xdmp:save($filename, xdmp:zip-create(<parts xmlns="xdmp:zip"><part>{$uri}</part></parts>, doc($uri))), xdmp:document-delete($uri) )

p.s. please ignore the [1 to 5] doc limit. Added for testing.


Solution

  • If any of the prior steps fail, will the document still be deleted?

    If there is an error in the execution of that module, the transaction will rollback and the delete from the database will be undone.

    However, the directory and zip file written to the filesystem will persist and will not be deleted. The xdmp:filesystem-directory-create() and xdmp:save() functions do not rollback or get undone if a transaction rolls back.

    Should I trust the engine to execute the return sequence in order given?

    Not sure that it matters much, given the statement above.

    Is this structure safe?

    It is unclear how many documents you might be dealing with. You may find that the filter is better/faster using cts:search and some indexes to target the candidate documents. Also, even if you can select the set of documents to process faster, if there are a lot of documents, you could still exceed execution time limits.

    Another approach might be to break up the work. Select the URIs of the documents that match the criteria, and then have separate query executions for each document that is responsible for saving the zip file and deleting the document from the database. This is likely to be faster, as you can process multiple documents in parallel, avoids the risk of a timeout, and in the event of an exception, allows for some items to fail without causing the entire set to fail and rollback.

    Tools such as CoRB were built exactly for this type of batch work.