Search code examples
recursiontransactionsxquerymarklogicxquery-update

Recursive copy of a folder with XQuery


I have to copy an entire project folder inside the MarkLogic server and instead of doing it manually I decided to do it with a recursive function, but is becoming the worst idea I have ever had. I'm having problems with the transactions and with the syntax but being new I don't find a true way to solve it. Here's my code, thank you for the help!

import module namespace dls = "http://marklogic.com/xdmp/dls" at "/MarkLogic/dls.xqy";

declare option xdmp:set-transaction-mode "update";

declare function local:recursive-copy($filesystem as xs:string, $uri as xs:string)
{
  for $e in xdmp:filesystem-directory($filesystem)/dir:entry
  return 
    if($e/dir:type/text() = "file")
        then dls:document-insert-and-manage($e/dir:filename, fn:false(), $e/dir:pathname)
    else
      (
          xdmp:directory-create(concat(concat($uri, data($e/dir:filename)), "/")),
          local:recursive-copy($e/dir:pathname, $uri)
      )

};

let $filesystemfolder := 'C:\Users\WB523152\Downloads\expath-ml-console-0.4.0\src'
let $uri := "/expath_console/"

return local:recursive-copy($filesystemfolder, $uri)

Solution

  • MLCP would have been nice to use. However, here is my version:

    declare option xdmp:set-transaction-mode "update";
    
    declare variable $prefix-replace := ('C:/', '/expath_console/');
    
    declare function local:recursive-copy($filesystem as xs:string){
       for $e in xdmp:filesystem-directory($filesystem)/dir:entry
        return 
          if($e/dir:type/text() = "file")
             then 
               let $source := $e/dir:pathname/text()
               let $dest := fn:replace($source, $prefix-replace[1], $prefix-replace[2]) 
               let $_ := xdmp:document-insert($source,
                  <options xmlns="xdmp:document-load">
                    <uri>{$dest}</uri>
                  </options>)
               return <record>
                         <from>{$source}</from>
                         <to>{$dest}</to>
                      </record>
             else
               local:recursive-copy($e/dir:pathname)
    
    };
    
    let $filesystemfolder := 'C:\Temp'
    
    return <results>{local:recursive-copy($filesystemfolder)}</results> 
    

    Please note the following:

    • I changed my sample to the C:\Temp dir
    • The output is XML only because by convention I try to do this in case I want to analyze results. It is actually how I found the error related to conflicting updates.
    • I chose to define a simple prefix replace on the URIs
    • I saw no need for DLS in your description
    • I saw no need for the explicit creation of directories in your use case
    • The reason you were getting conflicting updates because you were using just the filename as the URI. Across the whole directory structure, these names were not unique - hence the conflicting update on double inserts of same URI.
    • This is not solid code:
      • You would have to ensure that a URI is valid. Not all filesystem paths/names are OK for a URI, so you would want to test for this and escape chars if needed.
      • Large filesystems would time-out, so spawning in batches may be useful.
        • A an example, I might gather the list of docs as in my XML and then process that list by spawning a new task for every 100 documents. This could be accomplished by a simple loop over xdmp:spawn-function or using a library such as taskbot by @mblakele