I'm trying to get proxy and port value from this http://jsbin.com/noxuqusoga/edit?html
, output html page.
Here is a sample of the table structure from that page, including only one tr
, but the actual HTML has many tr
elements with similar structure:
<table class="table" id="tbl_proxy_list" width="950">
<tbody>
<tr data-proxy-id="1355950">
<td align="left"><abbr title="103.227.175.125">103.227.175.125 </abbr></td>
<td align="left"><a href="/proxy-server-list/port-8080/" title="Port 8080 proxies">8080</a></td>
<td align="left"><time class="icon icon-check timeago" datetime="2018-08-18 04:56:47Z">9 min ago</time></td>
<td align="left">
<div class="progress-bar" data-value="22" title="1089">
<div class="progress-bar-inner" style="width:22%; background-color: hsl(26.4,100%,50%);"> </div>
</div>
<small>1089 ms</small></td>
<td style="text-align:center !important;"><span style="color:#009900;">95%</span> <span> (94)</span></td>
<td align="left"><img alt="sg" class="flag flag-sg" src="/assets/images/blank.gif" style="vertical-align: middle;" /> <a href="/proxy-server-list/country-sg/" title="Proxies from Singapore">Singapore <span class="proxy-city"> - Bukit Timah </span> </a></td>
<td align="left"><span class="proxy_transparent" style="font-weight:bold; font-size:10px;">Transparent</span></td>
<td><span>-</span></td>
</tr>
</tbody>
</table>
I'm able to scrap the proxy address but I have difficulties with the port as the <td>
does not have an id or a class and as value some have hyperlinks, and others don't.
How can I make the result like --> ip:port
for the whole scrap result.
Here's my code
$html = file_get_html('http://jsbin.com/noxuqusoga/');
// Find all images
foreach($html->find('abbr') as $element)
echo $element->title . '<br>';
foreach($html->find('td a') as $element)
echo $element->plaintext . '<br>';
Please help,
Thanks
Instead of writing a selector for td
elements (or elements inside them, like abbr
or a
) write a selector for their tr
parent, then loop over these tr
s (rows) and for each row, get the children of that row which you need:
// Select all tr elements inside tbody
foreach ($html->find('tbody tr') as $row)
// the second parameter (zero) indicates we only need the first element matching our selector
// ip is in the first <abbr> element that is child of a td
$ip = $row->find('td abbr', 0)->plaintext;
// port is in the first <a> element that is child of a td
$port = $row->find('td a', 0)->plaintext;
print "$ip:$port\n";
}
As an alternative, you should know when selecting elements, besides using css selectors you also have the option to get elements by their index. In your case, what you want from each tr
is in the first and the second td
elements inside each tr
element. So you can also find the first and the second child of each tr
to extract the data.