Search code examples
phphtmldomhtml-parsingsiblings

How to get img's src and data from its sibling nodes


<?php 
$htmlget = new DOMDocument();

@$htmlget->loadHtmlFile(http://www.amazon.com);

$xpath = new DOMXPath( $htmlget);
$nodelist = $xpath->query( "//img/@src" );

foreach ($nodelist as $images){
    $value = $images->nodeValue;
}
?>

I got all img tags, but how do I get the information around the same element the image is in?

For example, on amazon.com, there's a kindle. I have the picture, now but need the information around it, such as the price description.


Solution

  • It depends on the markup of the requested page, here an example for getting the price on amazon:

    <?php
           $htmlget = new DOMDocument();
    
           @$htmlget->loadHtmlFile('http://www.amazon.com');
    
           $xpath = new DOMXPath( $htmlget);
           $nodelist = $xpath->query( "//img/@src" );
    
            foreach ($nodelist as $imageSrc){
    
          //fetch images with a parent node that has class "imagecontainer"
          if($imageSrc->parentNode->parentNode->getAttribute('class')=='imageContainer')
          {
            //skip dummy-images
            if(strstr($imageSrc->nodeValue,'transparent-pixel'))continue;
    
            //point to the common anchestor of image and product-details
            $wrapper=$imageSrc->parentNode->parentNode->parentNode->parentNode->parentNode;
    
            //fetch the price
            $price=$xpath->query( 'span[@class="red t14"]',$wrapper );
            if($price->length )
            {
               echo '<br/><img src="'.$imageSrc->nodeValue.'">'.$price->item(0)->nodeValue.'<br/>';
            };
          }
    }
    ?>
    

    But however, you shouldn't parse pages that way. If they want to provide you some information, the ususally have an API. If not, they don't want you to grab anything. Parsing that way is not reliable, the markup of the requested page can change every second(you may open a door for exploits too). It also may not be legal .