I have the following:
$html = "<a href="/path/to/page.html" title="Page name"><img src="path/to/image.jpg" alt="Alt name" />Page name</a>"
I need to extract href and src attribute and anchor text
My solution:
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node) {
$href = $node->getAttribute('href');
$title = $node->nodeValue;
}
foreach ($dom->getElementsByTagName('img') as $node) {
$img = $node->getAttribute('src');
}
What would be the smarter way?
You can avoid the loops if you use DOMXPath
to grab the elements directly:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXpath( $dom);
$a = $xpath->query( '//a')->item( 0); // Get the first <a> node
$img = $xpath->query( '//img', $a)->item( 0); // Get the <img> child of that <a>
Now, you can do:
echo $a->getAttribute('href');
echo $a->nodeValue;
echo $img->getAttribute('src');
This will print:
/path/to/page.html
Page name
path/to/image.jpg