Following a file_get_contents
, I receive this HTML:
<h1>
<a href="blablabla.html">Manhattan Skyline</a>
</h1>
I want to get the blablabla.html
part only.
How can I parse it with DOMDocument
feature in PHP?
Important: the HTML I receive contains more than one <a href="...">
.
What I try is:
$page = file_get_contents('https://...');
$dom = new DOMDocument();
$dom->loadHTML($page);
$xp = new DOMXpath($dom);
$url = $xp->query('h1//a[@href=""]');
$url = $url->item(0)->getAttribute('href');
Thanks for your help.
h1//a[@href=""]
is looking for an a
element with an href
attribute with an empty string as the value, whereas your href
attribute contains something other than the empty string as the value.
If that's the entire document, then you could use the expression //a
.
Otherwise, h1//a
should work as well.
If you require the a
element to have an href
attribute with any kind of value, you could use h1//a[@href]
.
If the h1
is not at the root of the document, you might want to use //h1
instead. So the last example would become //h1//a[@href]
.