Search code examples
xqueryexist-db

How to modify an in-memory document in eXist-db?


I would like to know how to modify an in-memory copy of original document stored in the DB. I am very happy with the update extension, which allows me to search/replace through text nodes and change them permanently. However, this behavior is not always what I want. There are some special occasions when I need to export the document with minor changes done on the fly. It does not seem eXist supports copy, which I would think about.

For permanent changes I use:

declare function cust-utils:replace-spaces-hard($document as xs:string) as empty() {
    let $doc := doc($document)/tei:TEI
    let $match := '(^|\s| )([szkvaiouSZKVAIOU])[\s]'
    for $i in (1 to 2)
    return
        for $text in $doc//text()
        return
            update value $text[matches(., $match)] with replace($text, $match, '$1$2 ')
};

(I iterate twice because it seems XPATH 2.0 does not allow to use look arounds in regexes and here matches are sometimes overlapping.)

How to do the same temporarily? I tried the interesting function from Datypic but it only returns particular nodes. I need to preserve the document order. Simply said, I need to go through a document tree, replace particular strings and return the document for latter usage as it is, without updating it in the DB.

UPDATE

Unfortunately, this:

declare function cust-utils:copy($input as item()*) as item()* {
    for $node in $input
    return $node
};

does absolutely the same as

declare function cust-utils:copy($input as item()*) as item()* {
for $node in $input
   return 
      typeswitch($node)
        case element()
           return
              element { name($node) } {
                for $att in $node/@*
                   return
                      attribute { name($att) } { $att }
                ,
                (: output all the sub-elements of this element recursively :)
                for $child in $node
                   return cust-utils:copy($child/node())
              }
        default return $node
};

… It seems it returns the document-node without real traversing.


Solution

  • eXist's XQuery Update extension writes all updates to the database and does not support in-memory operations. This in contrast to the W3C XQuery Update Facility 1.0+, which is not supported in eXist. Thus, in eXist, in-memory updates must be performed with pure XQuery, i.e., without the additional syntax and functionality of a formal Update facility.

    For in-memory updates with eXist, the traditional path is to perform an "identity transformation", typically using recursive typeswitch operations; see https://en.wikipedia.org/wiki/Identity_transform#Using_XQuery. A simple example showing transformation of text nodes, while preserving document order, is:

    xquery version "3.0";
    
    declare function local:transform($nodes as node()*) {
        for $node in $nodes
        return
            typeswitch ($node)
            case document-node() return 
                local:transform($node/node())
            case element() return 
                element {node-name($node)} {
                    $node/@*, 
                    local:transform($node/node())
                }
            case text() return 
                replace($node, '[a-z]+', upper-case($node))
            (: drop comment & processing-instruction nodes :)
            default return 
                ()
    };
    
    let $node := 
        document {
            element root {
                comment { "sample document" },
                element x {
                    text { "hello" },
                    element y {
                        text { "there" }
                    },
                    text { "friend" }
                }
            }
        }
    return 
        <results>
            <before>{$node}</before>
            <after>{local:transform($node)}</after>
        </results>
    

    The result:

    <result>
        <before>
            <root>
                <!-- sample document -->
                <x>hello <y>there</y> friend</x>
            </root>
        </before>
        <after>
            <root>
                <x>HELLO <y>THERE</y> FRIEND</x>
            </root>
        </after>
    </result>
    

    An alternate approach is to use an in-memory update module, such as Ryan J. Dew's "XQuery XML Memory Operations" module, at https://github.com/ryanjdew/XQuery-XML-Memory-Operations. If you clone the repository (or download the repository's .zip file and unzip it) and upload the folder to eXist's /db collection, the following code will work (adapted from this old exist-open post: http://markmail.org/message/pfvu5omj3ctfzrft):

    xquery version "3.0";
    
    import module namespace mem="http://maxdewpoint.blogspot.com/memory-operations" 
        at "/db/XQuery-XML-Memory-Operations-master/memory-operations-pure-xquery.xqy";
    
    let $node := <x>hello</x>
    let $copy := mem:copy($node)
    let $rename := mem:rename($copy, $node, fn:QName("foo", "y"))
    let $replace-value := mem:replace-value($rename, $node, "world")
    return
        mem:execute($replace-value) 
    

    The result:

    <y xmlns="foo">world</y>