Search code examples
xmlxpathxsltxml-parsingtei

XPath - how to exclude text from child node


I want this output (example):

I want this

I'm working with a XML/TEI document and I need to work with XPath expression and I want as output the text in the div/u, but without the text inside node element like "desc" or "vocal><desc" or the text between "anchor/><anchor/" (example).

From the code (example):

<div>
<u> 
I want this but 
     *<anchor/><desc>I don't want this</desc><anchor/>
      <anchor/>I don't want this also<anchor/>
     <del type="">I don't want this too</del>*
I want this
</u>
</div>

I tried to use (example) :

TEI//u[not(desc)]

But it excludes every <u> with <desc> inside.


Solution

  • If I read your requirements as:

    select any text node that is a child of u (i.e. not inside another element such as desc or del), but exclude text nodes that are in-between two anchor elements

    then I arrive at the following expression:

    //u/text()[not(preceding-sibling::*[1][self::anchor] and following-sibling::*[1][self::anchor])]
    

    Applying it to the given input produces:

    " 
    I want this but 
         **
    I want this
    "
    

    which is different from the output you say you want, but nevertheless conforms to the stated requirements.