Search code examples
phphtmlweb-scrapingdom

How Do I Scrape These Two Tables Using Simple html DOM?


I've been trying to figure out how to use php simple html DOM to scrape the the td class="job" with its respective salary. I can find and scrape divs by id or class no problem, but I'm not sure how to attack a table like this. Any help would be appreciated!

<table cellpadding="0" cellspacing="0" border="0" class="table01">
<tr>
    <td class="head">Test</td>
    <td class="job">
    <a href="/Illustrator" id="UniqueID1">Illustrator</a><br/>
    $23,729 - $95,429
    </td>
</tr>
<tr>
    <td class="head">Test</td>
    <td class="job">
    <a href="/Small_Business_Owner_%2f_Operator" id="UniqueID2">Small Business Owner / Operator</a><br/>
    $24,369 - $174,991
    </td>
</tr>
<tr>
    <td class="head">Test</td>
    <td class="job">
    <a href="/Waiter%2fWaitress" id="UniqueID3">Waiter/Waitress</a><br/>
    $7,483 - $34,188
    </td>
</tr>
</table>

<table cellpadding="0" cellspacing="0" border="0" class="table02">
<tr>
    <td class="head">Test</td>
    <td class="job" style="padding-right: 20px">
    <a href="/Graphic_Artist_%2f_Designer" id="UniqueID1">Graphic Artist / Designer</a><br/>
    $23,789 - $55,409
    </td>
</tr>
<tr>
    <td class="head">Test</td>
    <td class="job" style="padding-right: 20px">
    <a href="/Illustrator" id="UniqueID2">Illustrator</a><br/>
    $23,729 - $95,429
    </td>
</tr>    
<tr>
    <td class="head">Test</td>
    <td class="job" style="padding-right: 20px">
    <a href="/Art_Director" id="UniqueID3">Art Director</a><br/>
    $34,160 - $85,943
    </td>
</tr>
</table>

Solution

  •     $dom = new DOMDocument();
        $html = "your html data";
        // load html
        $dom->loadHTML($html);
        $xpath = new DOMXPath($dom);
    
        //this will gives you all td with class name is jobs.
        $my_xpath_query = "//table//td[contains(@class, 'job')]";
        $result_rows = $xpath->query($my_xpath_query);
    
        //iterate all td
        foreach ($result_rows as $result_object){
            echo $result_object->nodeValue;
        }