Search code examples
phphtmlsimple-html-dom

How can I parse HTML with Simple HTML DOM Parser


I have a website content, that I want to parse with Simple HTML DOM Parser, that is something like this:

...
<div id="page-content">
   <div id="search-results-main" class="wide">
      <table class="search-results">
         <thead>...</thead>
           <tbody>
              <tr id="ad-123123">
                 <td class="thumbnail">...</td>
              </tr>
              ...
           </tbody>
      </table>
   </div>
</div>
...

This is my code right now:

include('./simple_html_dom.php');

$html = file_get_html('http://www.domain.com/subsite');
$searchResults = $html->find('table[@class=search-results'); 

foreach($searchResults->find('tr[@id^=ad-]') as $tr) {
...
}

The problem is that I get this error right now:

mod_fcgid: stderr: PHP Fatal error:  Call to a member function find() on a non-object in /data/domains/mydomain/web/webroot/path/to/script.php on line 31

$html is not null, I already debugged it. I get the same result if I use this code for the table finding:

$searchResults = $html->find('.search-results'); 

What could be the problem?


Solution

  • There are two problems in your script:

    First, your search pattern is wrong (due to a typo?): you forgot to close square bracket. This line:

    $searchResults = $html->find('table[@class=search-results'); 
    

    must be:

    $searchResults = $html->find('table[@class=search-results]'); 
    #                                                        ↑
    

    Then, ->find() returns an array of objects, so you have to modify your next ->find() in this way:

    foreach( $searchResults[0]->find( 'tr[@id^=ad-]' ) as $tr )
    #                      ↑↑↑
    

    As alternative, you can use this syntax:

    $searchResult = $html->find( 'table[@class=search-results]', 0 ); 
    foreach( $searchResult->find( 'tr[@id^=ad-]' ) as $tr )
    

    The second argument of ->find() means: return only first matched node (key index = 0).