Search code examples
phpdomxpath

php - Query a table with DOMXPath


I try to access the values ​​of a table on a web page with a php expression DOMXPath::query. When I navigate with my web browser in this page I can see this table but when I execute my query this table isn't visible and don't seem accessible.

This table have an id, but when I specify it on my query an other one is returned. I want to read the table with the id 'totals', but I only have that one with the id 'per_game'. When I inspect page's code, a lot of elements seem to be in comments.

Here is my script:

<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('https://www.basketball-reference.com/players/j/jokicni01.html');
$xpath = new DOMXPath($doc);
$table = $xpath->query("//div[@id='totals']")->item(0);
$elem = $doc->saveXML($table);
echo $elem;
?>

How can i read elements in the table with the id 'totals' ?

The full path is /html/body/div[@id="wrap"]/div[@id="content"]/div[@id="all_totals"]/div[@class="table_outer_container"]/div[@id="div_totals"]/table[@id="totals"]


Solution

  • You can cut your query in two parts : first, retrieve the comment in the correct div, then create a new document with this content to retrieve the element you want :

    $doc = new DOMDocument;
    $doc->preserveWhiteSpace = false;
    $doc->strictErrorChecking = false;
    $doc->recover = true;
    @$doc->loadHTMLFile('https://www.basketball-reference.com/players/j/jokicni01.html');
    $xpath = new DOMXPath($doc);
    
    // retrieve the comment section in 'all_totals' div
    $all_totals_element = $xpath->query('/html/body/div[@id="wrap"]/div[@id="content"]/div[@id="all_totals"]/comment()')->item(0);
    $all_totals_table = $doc->saveXML($all_totals_element);
    
    // strip comment tags to keep the content inside
    $all_totals_table = substr($all_totals_table, strpos($all_totals_table, '<!--') + strlen('<!--'));
    $all_totals_table = substr($all_totals_table, 0, strpos($all_totals_table, '-->'));
    
    // create a new Document with the content of the comment
    $tableDoc = new DOMDocument ;
    $tableDoc->loadHTML($all_totals_table);
    $xpath = new DOMXPath($tableDoc);
    
    // second part of the query
    $totals = $xpath->query('/div[@class="table_outer_container"]/div[@id="div_totals"]/table[@id="totals"]')->item(0);
    
    echo $tableDoc->saveXML($totals) ;