extract nodes list between two nodes on Xquery

I work on an NLP project and i need to extract some informations form an XML document. Here is a piece of it. Each node item is a token with parts of speech, tag, lemma...

<file type="titre" name="2017/01/01/19-00-00/0,2-3208,1-0,0.xml">
<p type="description">
<p type="description">

I work on syntactic dependencies. Here you can see that nodes item are tokens (with parts of speech tag etc... My task is to target item with a[8]='sub'. After that, i need to extract the words in relation between. It's a[9]. It's the index of the beginning of the syntactic dependence. In the first sentence (description node), the sub item is


I need to extract his a[9] (here is 19). In fact, it's the index of the first word of my syntactic dependecie. This is this item (basing on index a[1])


What i have to do ? get all items (in fact a[2] value between the index of this word and my item with 'sub'. In the first sentence, the following output would be

quand le moteur a eu

it's an extraction of nodes between two nodes with index. But here is my following code. I can't grab the items nodes between each other item. Be careful, it may have more than one sub item by sentence so i needed to add a for loop

for $p in /basetalismane/file/*//p
let $items := /$p//item[a[8]='sub']
for $p in /basetalismane/file/*//p
let $items := /$p//item[a[8]='sub']
for $item in $items
let $target := /$item/a[9]
let $source := /$item/a[1]
return (
for $i in ($target to $source)
return string-join( $p/item[$i]/a[2]  , ' '))

I get only each word but not the sequence. I can't concatenate strings one word by one. i've done a return $nodes to see what i grab. It's only sub items. I want the item between. I would like a list of item or a string with their a[2] to have the words. In the second sentence, the following output would be

que la reconquete autoproclamée de la "capitale" de l'EI était

Thx for your help. I hope it's clear for you guys but it's hard to explains (i'm a french guy)


  • I think

    declare namespace output = "";
    declare option output:method 'text';
    declare option output:item-separator '&#10;';
    for $item in //p[@type = 'description']/item[a[8] = 'sub']
          $item/parent::p/item[a[1] = $item/a[9]]/
            let $next := following-sibling::item[a[8] = 'sub'][1] 
            return (following-sibling::item[. << $next], $next))/a[2],
          ' '


    quand le moteur a eu
    que la reconquête de la " capitale " autoproclamée de l' EI était

    Perhaps windowing or fold-right can also help express it.