Search code examples
phphtmlarraysxpathscraper

how to push dom html return into array in php?


This scrape returns:

  • line 1:date
  • line 2:home team
  • line 3:score
  • line 4:away team (same 4 elements repeated until the end)

I have tried everything to convert to an array, however no additions to this code have got the desired result, which would be something like a loop to give:

[1]date [2]home [3]score [4]away etc.

until the end of doc.

<?php 
$html = file_get_contents('http://www.soccerstats.com/round_details.asp?league=brazil'); //get the html returned from the following url

$doc = new DOMDocument();

libxml_use_internal_errors(TRUE);

if(!empty($html)){

  $doc->loadHTML($html);
  libxml_clear_errors(); //remove errors (html)

  $xpath = new DOMXPath($doc);

$rows = $xpath->query('//b/font');

  if($rows->length > 0){
      foreach($rows as $row){
         // $array[] = $row->nodeValue . "<br/>";
$array = $row->nodeValue . "<br/>";


print_r ($array);
      }
  }
}

?>

results:

1 Jun 14
 Fluminense 
1 - 1
 Internacional 
1 Jun 14
 Vitória 
0 - 1
 Sport Recife 
1 Jun 14
 Corinthians 
1 - 1
 Botafogo 
1 Jun 14
 Chapecoense 
2 - 1
 Bahia 
1 Jun 14
 Cruzeiro 
3 - 0
 Flamengo 
1 Jun 14
 Santos 
2 - 0
 Criciúma 
1 Jun 14
 Grêmio 
0 - 0
 Palmeiras 
1 Jun 14
 Figueirense 
1 - 3
 Atlético PR 
31 May 14
 São Paulo 
2 - 1
 Atlético MG 
31 May 14
 Coritiba 
3 - 0
 Goiás 
30 May 14
 Bahia 
0 - 2
 Santos 
29 May 14
 Internacional 
2 - 0
 Chapecoense 
29 May 14
 Flamengo 
1 - 1
 Figueirense 
29 May 14
 Atlético MG 
2 - 0
 Fluminense 
29 May 14
 Atlético PR 
2 - 2
 São Paulo 
29 May 14
 Corinthians 
1 - 0
 Cruzeiro 
29 May 14
 Goiás 
0 - 0
 Vitória  

Solution

  • Actually, you are already on the right path, you need to separate first the values on the tables, then from there you can use getElementsByTagName to reach for your desired values.

    Consider this example: Sample Fiddle

    $data = array();
    $html = file_get_contents('http://www.soccerstats.com/round_details.asp?league=brazil'); //get the html returned from the following url
    $doc = new DOMDocument();
    libxml_use_internal_errors(true);
    
    if(!empty($html)){
        $doc->loadHTML('<meta http-equiv="content-type" content="text/html; charset=utf-8">'.$html);
        libxml_clear_errors();
        $xpath = new DOMXPath($doc);
    
        $entries = $xpath->query('//table[@class="stat"]');
        foreach($entries as $key => $value) {
    
            $data[] = array(
                'date' => trim($value->getElementsByTagName('font')->item(0)->nodeValue),
                'home' => trim($value->getElementsByTagName('font')->item(1)->nodeValue),
                'score' => trim($value->getElementsByTagName('font')->item(2)->nodeValue),
                'away' => trim($value->getElementsByTagName('font')->item(3)->nodeValue),
            );
        }
    }
    
    echo "<pre>";
    print_r($data);
    echo "</pre>";
    

    Sample Fiddle