Search code examples
phpdomsimple-html-domscreen-scraping

Screen scraping a two-column table with PHP


Sounds simple enough but I'm new to the whole screen scraping thing. What I have is a remote site http://www.remotesite.com (for example purposes) that has a schedule table with a structure like so:

<table>
  <tr>
    <td class="team">
      Team 1
    </td>
    <td class="team">
      Team 2
    </td>
  </tr>
</table>

The table is populated with a dynamic range of entries depending on the the number of games going that day where Team 1 vs Team 2 etc.

I've built my scraper to get a list of all the teams listed in the table and it works successfully. Here's the code:

<?php
// Load Simple DOM
    include_once("simple_html_dom.php");
    
// Scrape the Schedule
    libxml_use_internal_errors(true);
    $dom = new DOMDocument();
    $html = file_get_html("http://www.remotesite.com/schedule.htm");
    
    // Load HTML
        $dom->loadHTML($html);
        $xpath = new DOMXPath($dom);

    // Get all the Teams
        $my_xpath_query = "//table//td[contains(@class, 'team')]";
        $result_rows = $xpath->query($my_xpath_query);

?>

And to echo the scrape I have this code:

<?php
    // Display the schedule
        foreach ($result_rows as $result_object){
            echo $result_object->nodeValue;
        }
?>

However, what this does is echo out the teams like so:

Team1Team2Team3Team4Team5Team6 etc, etc.

It is getting the pairs of teams that are playing against each other in the correct order but what I need to do is essentially echo out the table the same way I'm fetching it.

Thanks in advance for any help you can give me!


Solution

  • Based on your answers to my questions, I'd suggest just doing something like this:

    $rows = '';
    $teams = array();
    
    // Pull team names into array
    foreach ($result_rows as $result_object){
       $teams[] = $result_object->nodeValue;
    }
    
    // Extract two teams per table row
    while(count($teams)){
       $matchup = array_splice($teams, 0, 2);
       $rows .= '<tr><td>'.implode('</td><td>', $matchup).'</td></tr>';
    }
    
    // Write out the table
    echo "<table>$rows</table>';