<body class="en-us"> <div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
</a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level"><strong>85</strong></span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
<ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li><a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np"><span class="arrow"><span class=
"icon">Character Summary</span></span></a></li>
<li class="root-menu"><a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np"><span class=
"arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=" active"><a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np"><span class="arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=""><a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np"><span class="arrow"><span class=
"icon">General</span></span></a></li>
I know that I have posted a lot of useless code here but wanted you guys to have an idea of wwhat the DOM would look like.
From this:
<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>
I would like to extract this:
/wow/en/character/some-server/sometoon/achievement#92
which comes from the last anchor in the posted markup.
I have read as much as I can find on how to use xpath query to extract the needed information but I am clearly missing something. Below is the query that I thought should work but does not.
<?php
$query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
echo $query . "<br>";
$achievementSubCategory = $xpath->query($query);
$achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
var_dump($achiSubArray);
// Produces array(1) { ["URL"]=> NULL } which should look something more like:
// array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>
Thank you in advance for your assistance and advice
*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href
There are a few problems with this XPath expression:
It is looking for a ul
element that is a crandchild of the current node, and that has an attribute named class
whose string value is equal to the string value of one of the children-elements of ul
, named profile-sidebar-menu
. However, the ul
has no children named profile-sidebar-menu
and the whole expression doesn't select any node.
Another problem is the indexing. li[3]
selects the third li
element - child of the context node. However the wanted a
element is a child of the fourth li
child of the context node. This must be expressed as: li[4]
. XPath positions are 1-based, not 0-based.
If these two problems are corrected, I believe that the corrected expression should look like the following:
*/ul[@class="profile-sidebar-menu"]/ul/li[4]/a/@href
The absolute XPath expression that selects the wanted href
attribute starting from the top element body
of the provided XML document, is:
/*/*/*/*/*/*/*/*/*/*/ul/li[4]/a/@href
Below is the XML document (the provided one, made well-formed by appending a number of missing end tags:
<body class="en-us">
<div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style=""></a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level">
<strong>85</strong>
</span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
<ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li>
<a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np">
<span class="arrow">
<span class=
"icon">Character Summary</span></span>
</a>
</li>
<li class="root-menu">
<a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np">
<span class=
"arrow">
<span class=
"icon">Achievements</span></span>
</a>
</li>
<li class=" active">
<a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np">
<span class="arrow">
<span class=
"icon">Achievements</span></span>
</a>
</li>
<li class="">
<a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np">
<span class="arrow">
<span class=
"icon">General</span></span>
</a>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
One can check that the above absolute XPath expression selects exactly the wanted href
attribute, by evaluating it with a tool like the Xpath Visualizer.
Here is a snapshot of the selection, performed with the XPath Visualizer: