Search code examples
phphtmlparsingget

php read html and handle double id-appearance


For my project I'm reading an external website which has used the same ID twice. I can't change that.

I need the content from the second appearance of that ID but my code just results the first one and does not see the second one. Also a count to $data results 1 but not 2. I'm desperate. Does anyone have an idea how to access the second ID 'hours'?

<?PHP
  $url = 'myurl';
  $contents = file_get_contents($url);
  $dom = new DOMDocument();
  libxml_use_internal_errors(true);
  $dom->loadHTMLFile($url);
  $data = $dom->getElementById("hours");
  echo $data->nodeValue."\n";
  echo count($data);
?>

Solution

  • As @rickdenhaan points out, getElementById always returns a single element which is the first element that has that specific value of id. However you can use DOMXPath to find all nodes which have a given id value and then pick out the one you want (in this code it will find the second one):

    $url = 'myurl';
    $contents = file_get_contents($url);
    $dom = new DOMDocument();
    libxml_use_internal_errors(true);
    $dom->loadHTMLFile($url);
    $xpath = new DOMXPath($dom);
    $count = 0;
    foreach ($xpath->query("//*[@id='hours']") as $node) {
        if ($count == 1) echo $node->nodeValue;
        $count++;
    }
    

    As @NigelRen points out in the comments, you can simplify this further by directly selecting the second input in the XPath i.e.

    $node = $xpath->query("(//*[@id='hours'])[2]")[0];
    echo $node->nodeValue;
    

    Demo on 3v4l.org