Search code examples
xmlxpathbasexflwor

Basex XQL merging content with same tag name, regardless of location in document


For example, given XML:

<root>
    <item>
        <id>111</id>
        <description>aisle 12, shelf 3</description>
        <description>inside the box</description>
    </item>
</root>

I would like the result:

<root>
    <item>
        <id>111</id>
        <description>aisle 12, shelf 3 inside the box</description>
    </item>
</root>

But the node may have any name, and be at any level. I would like the same query to work with different XML, as long as the tag is repeated:

<root>
    <item>
        <id>112</id>
        <attributes>
            <author>Joe Smith</author>
            <author>Arthur Clarke</author>
            <author>Jeremiah Wright</author>
        </attributes>
    </item>
</root>

Output:

<root>
    <item>
        <id>112</id>
        <attributes>
            <author>Joe Smith Arthur Clarke Jeremiah Wright</author>
        </attributes>
    </item>
</root>

Is this possible with BaseX ? If not, can we do this given a known element (for example, only for /root/item/attributes/author)?


Solution

  • Ensuring to only merge directly following siblings complicates things a little bit. I added some comments on how the code is working below.

    let $xml := document{<root>
        <item>
            <id>112</id>
            <attributes>
                <author>Joe Smith</author>
                <author>Arthur Clarke</author>
                <author>Jeremiah Wright</author>
                <foo/>
                <author>Donald Duck</author>
            </attributes>
        </item>
    </root>}
    return
      (: Use an XQuery Update transformation :)
      copy $copy := $xml
      modify (
        (: Loop over all leaves (only containing text nodes. :)
        (: This might have to be adjusted if you want to merge arbitrary nodes. :)
        for $leaf in $copy//*[not(*)]
        (: Where the preceding node is not of the same name :)
        (: (as it will be merged anyway) :)
        where not($leaf/preceding-sibling::*[1 and name(.) eq name($leaf)])
        (: Now find following siblings... :)
        let $siblings := $leaf/following-sibling::*[
          (: ... of the same name ... :)
          name(.) eq name($leaf) and
          (: ... and that do not have a node with another name in-between :)
          not(preceding-sibling::*[name(.) != name($leaf) and $leaf << .])
        ]
        return (
          (: Merge text contents into $leaf :)
          replace value of node $leaf with string-join(($leaf, $siblings), ' '),
          (: And delete all others :)
          delete nodes $siblings
        )
      )
      return $copy