DOCUMENT: http://en.wikiquote.org/wiki/The_Matrix
I'd want to get all quotes (//ul/li) of the first section (Neo's quotes).
I cannot do //ul[1]/li
because in some wikiquote's pages a quote is represented in this form
<h2><span class="mw-headline" id="Neo">Neo</span></h2>
<ul>
<li> First quote </li>
</ul>
<ul>
<li> Second quote </li>
</ul>
<h2><span class="mw-headline" id="dont wanna this">Useless</span></h2>
Instead of
<ul>
<li> First quote </li>
<li> Second quote </li>
</ul>
I've tried this to get the first section
(//*[@id='mw-content-text']/ul/preceding-sibling::h2/span[@class='mw-headline'])[1]
but I having problem to get only the quotes of the first section. May you help me?
Use:
(//h2[span/@id='Neo'])[1]/following-sibling::ul
[count(.
|
(//h2[span/@id='Neo'])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
=
count((//h2[span/@id='Neo'])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
]
/li
This selects all li
that immediately follow the first h2
with a span
child that has an id
attribute with value "Neo".
To select the qoutatations for the second such h2
, simply replace in the above expression 1
with 2
.
Do this for all numbers: 1,2, ..., count(//h2[span/@id='Neo'])
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"(//h2[span/@id='Neo'])[1]/following-sibling::ul
[count(.
|
(//h2[span/@id='Neo'])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
=
count((//h2[span/@id='Neo'])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
]
/li
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<html>
<h2><span class="mw-headline" id="Neo">Neo</span></h2>
<ul>
<li> First quote </li>
</ul>
<ul>
<li> Second quote </li>
</ul>
<h2><span class="mw-headline" id="dont wanna this">Useless</span></h2> >
</html>
the XPath expression is evaluated, and the selected nodes are copied to the output:
<li> First quote </li>
<li> Second quote </li>
Explanation:
This follows from the Kayessian (by Dr. Michael Kay) formula for intersection of two node-sets:
$ns1[count(.|$ns2) = count($ns2)]
the above selects exactly all nodes that belong both to the nodeset $ns
and the nodeset $ns2
.
So, we substitute $ns1
with the nodeset consisting of all following siblings ul
of the h2
of interest. We substitute $ns2
with the nodeset consisting of all preceding siblings ul
of the h2
that is the immediate (1st) following sibling of the h2
of interest.
The intersection of these two nodesets contains exactly all ul
elements that are wanted.
Update: In a comment the OP states that he only knows that he wants the results to be from the first section -- the string "Neo" isn't known.
Here is the modified solution:
(//h2[span/@id=$vSectionId])[1]
/following-sibling::ul
[count(.
|
(//h2[span/@id=$vSectionId])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
=
count((//h2[span/@id=$vSectionId])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
]
/li
The variable $vSectionId
must be obtained as the string value of the following XPath expression:
substring(//div[h2='Contents']
/following-sibling::ul[1]
/li[1]/a/@href,
2)
Here we are getting the wanted id
from the href
of the a
in the first Table Of Contents entry, and skipping the first character "#".
Here is again an XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vSectionId" select=
"substring(//div[h2='Contents']
/following-sibling::ul[1]
/li[1]/a/@href,
2)
"/>
<xsl:template match="/">
<xsl:copy-of select=
"(//h2[span/@id=$vSectionId])[1]
/following-sibling::ul
[count(.
|
(//h2[span/@id=$vSectionId])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
=
count((//h2[span/@id=$vSectionId])[1]
/following-sibling::h2[1]
/preceding-sibling::ul
)
]
/li
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the complete XML document that is at: http://en.wikiquote.org/wiki/The_Matrix, the result of applying these two XPath expressions (substituting the result of the first in the second, then evaluating the second expression) is the wanted, correct one:
<li>I know you're out there. I can feel you now. I know that you're afraid. You're afraid of us. You're afraid of change. I don't know the future. I didn't come here to tell you how this is going to end. I came here to tell you how it's going to begin. I'm going to hang up this phone, and then I'm going to show these people what you don't want them to see. I'm going to show them a world … without you. A world without rules and controls, without borders or boundaries; a world where anything is possible. Where we go from there is a choice I leave to you.</li>
<li>Whoa.</li>
<li>I know kung-fu.</li>
<li>Yeah. Well, that sounds like a pretty good deal. But I think I may have a better one. How about, I give you the finger [He does] and you give me my phone call.</li>
<li>Guns.. lots of guns...</li>
<li>There is no spoon.</li>
<li>My name...is Neo!</li>