Search code examples
phphtmldomdocument

How can I get a href,Image src,title from given html using DomDocument


Given Html -

  <div id="testid">
  <h1>Test Title</h1>
      <ul class="clearfix">
        <li class="anker" id="artists-A"></li>
        <li class="first">
            <a href="www.test1.html" title="Test1">
            <span>
            <img src="https://www.test1.de/img/test1.jpg" alt="Test1" />
            <span>Test1</span>
            </span>
            </a>
        </li>
        <li>
            <a href="www.test2.html" title="Test2">
            <span>
            <img src="https://www.test2.de/img/test2.jpg" alt="Test2" />
            <span>Test2</span>
            </span>
            </a>
        </li>
        <li class="first">
            <a href="www.test3.html" title="Test3">
            <span>
            <img src="https://www.test1.de/img/test3.jpg" alt="Test3" />
            <span>Test3</span>
            </span>
            </a>
        </li>
      </ul> 
</div>

Need to get a href value,img src and span ie Title . I am parsing this using domDocument but not getting exact result.

Code:

$doc = new DomDocument; 
$doc->validateOnParse = true; 
$doc->loadHtml(file_get_contents($url)); 
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//[@id="testid"]/ul/li');

Solution

  • Here we are using DOMDocument. For now i am gathering a's href and img's src, you can add further more tags you want.

    Try this code snippet here

    $domDocument = new DOMDocument();
    $domDocument->loadHTML($string);
    
    $domXPath = new DOMXPath($domDocument);
    $results = $domXPath->query("//div[@id='testid']");//querying div with id="testid"
    $results = $domXPath->query("//a|//img",$results->item(0));//querying resultant div for a and img
    $data=array();
    foreach($results as $result){
        if($result->tagName=="a")//checking for anchor tags
        {
            $data["a"][]=array(
                "href"=>$result->getAttribute("href"),
                "title"=>$result->getAttribute("title")
            );
        }
        elseif($result->tagName=="img")//checking for image tags
        {
            $data["img"][]=$result->getAttribute("src");
        }
    }
    print_r($data);