Search code examples
xpathxquery

XPath/XQuery procedural grouping


I have a XML file:

<books>
  <title>Moby-Dick</title>
  <author>Herman Melville</author>
  <title>Sunrise Nights</title>
  <author>Jeff Zentner</author>
  <author>Brittany Cavallaro</author>
  <price>14.52€</price>
  <title>My Salty Mary</title>
  <author>Cynthia Hand</author>
  <author>Brodi Ashton</author>
  <author>Jodi Meadows</author>
</books>

For which I would like to create a new book node every time I encounter a title node, putting every following non‑title node into that book. My expected output is:

<books>
  <book>
    <title>Moby-Dick</title>
    <author>Herman Melville</author>
  </book>
  <book>
    <title>Sunrise Nights</title>
    <author>Jeff Zentner</author>
    <author>Brittany Cavallaro</author>
    <price>14.52€</price>
  </book>
  <book>
    <title>My Salty Mary</title>
    <author>Cynthia Hand</author>
    <author>Brodi Ashton</author>
    <author>Jodi Meadows</author>
  </book>
</books>

As XQuery's group by doesn't seem to be useful here, I'm trying to do the grouping in an array using fold-left. Here's what I wrote so far:

aside: There might be a simpler method so feel free to point it out to me.

let $groups := (
  doc("books.xml")/books/* =>
    fold-left((array{}, 0), function($acc, $node) {
      let
        $arr := $acc[1],
        $idx := $acc[2]
      return
        if (name($node) = "title")
        then ($arr => array:append($node), $idx+1)
        else ($arr => array:put($idx, ($arr($idx), $node)), $idx)
    })
  )[1]
return
  <books>{
    for $nodes in $groups
    return <book>{$nodes}</book>
  }</books>

Which wrongly outputs:

<books>
  <book>
    <title>Moby-Dick</title>
    <author>Herman Melville</author>
    <title>Sunrise Nights</title>
    <author>Jeff Zentner</author>
    <author>Brittany Cavallaro</author>
    <price>14.52€</price>
    <title>My Salty Mary</title>
    <author>Cynthia Hand</author>
    <author>Brodi Ashton</author>
    <author>Jodi Meadows</author>
  </book>
</books>

POSTSCRIPT

I found the error in my code: I was iterating over the $groups array with a for $nodes in $groups return ... which should be $groups => array:for-each(function($nodes){...}) instead.
For a simpler method than fold-left check the accepted answer.


Solution

  • If you have the option of using XSLT 2.0+, use:

    <xsl:template match="booke">
      <books>
         <xsl:for-each-group select="*" 
                    group-starting-with="title">
           <book>
              <xsl:copy-of select="current-group()"/>
           </book>
         </xsl:for-each-group>
      </books>
    </xsl:template>
    

    In XQuery 3.0+ it can be done using the FLWOR window clause.

    for tumbling window $w in books/*
       start $s when $s[self::title]
       return <book>{$w}</book>
    

    Not tested.