Search code examples
phphtmlxpathweb-scrapingdomdocument

How to web-scrape in in divs with DOMparser


I am trying to get div and for other pages, trying to put it in a foreach. But facing some troubles,

<div class="article_info">
    <ul class="c-result_box">
     <li>
      <div class="inner cf">
       <div class="c-header">
         <div class="c-logo"> 
           <im src="/e/designs/31sumai/common/img/logo_08.png" alt="#"> 
             </div>
               <p class="c-supplier">三井のマンション</p>
                    <p class="c-name">
                        <a href="https://www.31sumai.com/mfr/K1503/" class="link" target="_blank">パークリュクス大阪天満</a>
                    </p>

I'm trying to get the text inside the <a> element, here is my codes, what I am missing here?

$start_id = 1501;
while(true){

    $url = 'https://www.31sumai.com/mfr/K'.$start_id.'/outline.html';
    $html = file_get_contents($url);
    libxml_use_internal_errors(true);
    $DOMParser = new \DOMDocument();
    $DOMParser->loadHTML($html);
    $xpath = new \DOMXPath($DOMParser);

    $classname="c-name";
    $nodes = $finder->query("//*[contains(@class, '$classname')]");
    $MyTable = false; 
    $insertData = [];  
    foreach($nodes as $node){
        $allNames = [];
        foreach($node->getElementsByTagName('a') as $a){
            $name = $a->getElementsByTagName('a');
            $allProperties[] = [
                'names' => $name];
        }

    }

Thank you for helping!


Solution

  • You can rely on your XPath query to pull all the text node that you want, and then just get the nodeValue property within your loop:

    $start_id = "1501";
    $url = "https://www.31sumai.com/mfr/K$start_id/outline.html";
    $html = file_get_contents($url);
    libxml_use_internal_errors(true);
    $DOMParser = new \DOMDocument();
    $DOMParser->loadHTML($html);
    $xpath = new \DOMXPath($DOMParser);
    
    $classname="c-name";
    
    $nodes = $xpath->query("//*[contains(@class, '$classname')]/a/text()");
    foreach($nodes as $node){
        echo $node->nodeValue;
    }