Search code examples
phphtmlsimple-html-dom

How to find specific data using simple html dom php


when i scrape the table, the table tr and td values are changing. below is the orginal table.

<table class="scoretable">
<tbody>
<tr><td class="jdhead">Name</td><td class="fullhead">John</td></tr>
<tr><td class="jdhead">Age</td><td class="fullhead">30</td></tr>
<tr><td class="jdhead">Phone</td><td class="fullhead">91234988788</td></tr>
<tr><td class="jdhead">Location</td><td class="fullhead">Madrid</td></tr>
<tr><td class="jdhead">Country</td><td class="fullhead">Spain</td></tr>
<tr><td class="jdhead">Role</td><td class="fullhead">Manager</td></tr>
</tbody>
</table>

<table class="scoretable">
<tbody>
<tr><td class="jdhead">Name</td><td class="fullhead">John</td></tr>
<tr><td class="jdhead">Age</td><td class="fullhead">30</td></tr>
<tr><td class="jdhead">Phone</td><td class="fullhead">91234988788</td></tr>
<tr><td class="jdhead">Role</td><td class="fullhead">Manager</td></tr>
</tbody>
</table>

Above two tables are from different pages. I need to scrape Name, Phone and Role.

$url = "http://name.com/listings";
$html = file_get_html( $url );

$posts1 = $html->find('td[class=fullhead]',1);

foreach ( $posts1 as $post1 ) {
    $poster1 = $post1->outertext;
    echo $poster1;
    }

Solution

  • I would try to preg_match the needed values from the HTML like this:

    <?php
    $url = 'http://name.com/listings';
    $html = file_get_contents($url);
    
    if (preg_match('~<tr><td class="jdhead">Name</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
        echo $matches[1]; // here is you name   
    }
    
    if (preg_match('~<tr><td class="jdhead">Phone</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
        echo $matches[1]; // here is you phone  
    }
    
    if (preg_match('~<tr><td class="jdhead">Role</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
        echo $matches[1]; // here is you role   
    }
    

    Update (see comments below):

    <?php
    $url = 'http://jobsearch.naukri.com/job-listings-010915006292';
    $html = file_get_contents($url);
    
    if (preg_match('~<TR VALIGN="top"> <TD CLASS="jdHead">Job Posted </TD> <TD VALIGN="top" CLASS="detailJob">([^<]*)</TD> </TR>~', $html, $matches)) {
        echo 'Job Posted: ' . $matches[1] . '<br><br>';
    }
    
    
    if (preg_match('~<TR VALIGN="top"> <TD CLASS="jdHead">Job Description</TD> <TD VALIGN="top" CLASS="detailJob">(.*?)</TD> </TR>~', $html, $matches)) {
        echo 'Job Description: ' . $matches[1] . '<br><br>';
    }