Search code examples
phpxmlsimple-html-dom

PHP Simple Html Dom extract multiple tags from one class


I am a newbie to use simple html dom with php and I am struggling to extract multiple html tags from one class. I have multiple blocks of html like this in a single page

    <div class="file-right"> 
         <a href="/secrets-of-the-millionaire-mind-tomocubcom-e17682584.html" class="ai-similar" data-id="17682584" data-loc="3">
           <h2><b>Secrets</b> of the <b>Millionaire</b> <b>Mind</b> - TOMOCUB.COM</h2>
         </a>
           <span class="fi-pagecount">223 Pages</span>
           <span class="fi-year">2005</span>
           <span class="fi-size hidemobile">1015 KB</span>
         </div>
     2 - <b>Secrets</b> of the <b>Millionaire</b> <b>Mind</b> and your achievement of <b>success</b>. As you’ve probably fo&nbsp;...
   </div> 

and from each block this html I want to extract

  1. href link
  2. the plain text in tags
  3. each of the 3 span's element text

I have been doing it in php but getting errors again and again. This is the code what i have uptill now

$html = @str_get_html($response);
$allblocks=$html->find('div.file-right'); //this selects all file-right blocks
if(isset($allblocks)){
   foreach($allblocks as $singleblock){
      echo $singleblock->plaintext; // but i get an error here PHP Notice:  Array to string conversion

   }
}

Can anyone help me please.


Solution

  • You need to build up the various layers of taking the HTML apart, you started by finding the <div> tag. You can from that find the <a> tag within this <div> and then get the href attribute (using ->href). This code assumes that there is only one <a> tag, so rather than a foreach I just say use the first one (using [0]).

    The <span> tags is a similar process, but as there are repeated elements, this time it uses a foreach. This code outputs the class attribute and the contents of the span.

    $html = str_get_html($response);
    $allblocks=$html->find('div.file-right'); //this selects all file-right blocks
    if ( count($allblocks) > 0 ){
        foreach ( $allblocks as $block )    {
            $anchor = $block->find("a");
            echo "href=".$anchor[0]->href.PHP_EOL;
            echo "text=".$anchor[0]->plaintext.PHP_EOL;
            $spans = $block->find("span");
            foreach ( $spans as $span ) {
                echo "span=".$span->class."=".$span->plaintext.PHP_EOL;
            }
        }
    }
    

    Note that when in your original code you used isset($allblocks), as the line before set it's value - even if it didn't find anything it will still have a value. In this code I use count() to check if anything is returned by the previous call to find().

    With your sample HTML, wrapped only in a minumum page, the output is...

    href=/secrets-of-the-millionaire-mind-tomocubcom-e17682584.html
    text=            Secrets of the Millionaire Mind - TOMOCUB.COM          
    span=fi-pagecount=223 Pages 
    span=fi-year=2005 
    span=fi-size hidemobile=1015 KB