I'm fetching an HTML webpage with file_get_contents()
, I get a table like below, there are more than 150 rows:
<tr class="tabrow ">
<td class="tabcol tdmin_2l">FIRST+DATA</td>
<td class="tabcol">
<a class="modal-button" title="SECOND+DATA" href="THIRD+DATA" rel="{handler: 'iframe', size: {x: 800, y: 640}, overlayOpacity: 0.9, classWindow: 'phocamaps-plugin-window', classOverlay: 'phocamaps-plugin-overlay'}">
asdxxx
</a>
</td>
<td class="tabcol"></td>
<td class="tabcol">FOURTH+DATA</td>
</tr>
I want to get the FIRST DATA
, SECOND DATA
, THIRD DATA
and FOURTH DATA
with a preg_match_all()
call. I tried to write multiple patterns, but I couldn't succeed. Here's what I tried:
preg_match_all('/(<td class="tabcol tdmin_2l">|title=")(.*?)(<\/td>|")/s', $raw, $matches, PREG_SET_ORDER);
What's the true patterns?
Try this:
$str = <<<HTML
<tr class="tabrow ">
<td class="tabcol tdmin_2l">FIRST+DATA</td>
<td class="tabcol"><a class="modal-button" title="SECOND+DATA" href="THIRD+DATA" rel="{handler: 'iframe', size: {x: 800, y: 640}, overlayOpacity: 0.9, classWindow: 'phocamaps-plugin-window', classOverlay: 'phocamaps-plugin-overlay'}">asdxxx</a></td>
<td class="tabcol"></td>
<td class="tabcol">FOURTH+DATA</td>
</tr>
HTML;
preg_match_all('/<td[^>]*>(.*?)<\/td>/im', $str, $td_matches);
preg_match('/ title="([^"]*)"/i', $td_matches[1][1], $title);
preg_match('/ href="([^"]*)"/i', $td_matches[1][1], $href);
echo $td_matches[1][0] . "\n";
echo $title[1] . "\n";
echo $href[1] . "\n";
echo $td_matches[1][3];