I am trying to select the value of the <li> ancestor node just previous to the parent node. Here is a sample of the document, im-trg.xml:
<trg>
<category>
<h2>Accounting and Auditing</h2>
<ul>
<li>Laws and Regulations
<ul>
<li><a href="url1">Regulation S-X</a></li>
</ul>
</li>
<li>Staff Guidance
<ul>
<li>No Action Letters
<ul>
<li><a href="url2">Robert Van Grover, Esq., Seward and Kissel LLP</a> (November 5, 2013)</li>
</ul>
</li>
</ul>
</li>
</ul>
</category>
</trg>
Here is my query:
for $x in doc("C:\im-trg.xml")//li/a
return
<item>
<title>{data($x)}</title>
<documentType>{data($x/ancestor::li[2])}</documentType>
<category>{data($x/ancestor::category/h2)}</category>
</item>
I am getting:
<item>
<title>Regulation S-X</title>
<documentType>Laws and RegulationsRegulation S-X</documentType>
<category>Accounting and Auditing</category>
</item>
For <documentType>, I want to select only the ancestor <li> immediately previous to the <li> parent of the <a>, which indicates the type of document, so I want:
<item>
<title>Regulation S-X</title>
<documentType>Laws and Regulations</documentType>
<category>Accounting and Auditing</category>
</item>
and
<item>
<title>Robert Van Grover, Esq., Seward and Kissel LLP</title>
<documentType>No Action Letters</documentType>
<category>Accounting and Auditing</category>
</item>
I don't think I can come down from the root because the parent <li> is sometimes double nested and sometimes triple nested.
Text value of an element is the concatenation of all its text-node descendants. If you only want the text immediately contained by the element, you should explicitly select its text children, eg
data($x/ancestor::li[2]/text())