Search code examples
phpxpathhtml-content-extraction

XPATH/PHP - Smarter way to acommplish this?


I have the following:

$html = "<a href="/path/to/page.html" title="Page name"><img src="path/to/image.jpg" alt="Alt name"  />Page name</a>" 

I need to extract href and src attribute and anchor text

My solution:

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node) { 
    $href = $node->getAttribute('href');
    $title = $node->nodeValue;
}
foreach ($dom->getElementsByTagName('img') as $node) { 
    $img = $node->getAttribute('src');
}

What would be the smarter way?


Solution

  • You can avoid the loops if you use DOMXPath to grab the elements directly:

    $dom = new DOMDocument;
    $dom->loadHTML($html);
    $xpath = new DOMXpath( $dom);
    
    $a = $xpath->query( '//a')->item( 0);         // Get the first <a> node
    $img = $xpath->query( '//img', $a)->item( 0); // Get the <img> child of that <a>
    

    Now, you can do:

    echo $a->getAttribute('href');
    echo $a->nodeValue;
    echo $img->getAttribute('src');
    

    This will print:

    /path/to/page.html 
    Page name 
    path/to/image.jpg