Search code examples
phpdomxpathdomdocumentdomxpath

PHP DOMXPath query using the innerHTML/nodeValue of an element to find and return the element


Can you please help me with the correct syntax to use when you want to check the innerHTML/nodeValue of an element?

I have no problem with the Name however the Age is within a plain div element, What is the correct syntax to use in place of "NOT SURE WHAT TO PUT HERE" below.

$html is a page from the internet

The persons name is in a span like:

<span class="fullname">John Smith</span>

The persons age is in a div like:

<div>Age: 28</div>

I have the following PHP:

<?php
$dom = new DomDocument();
@$dom->loadHTML($html);
$finder = new DOMXPath($dom);

//Full Name
$findName = "fullname";
$queryName = $finder->query("//span[contains(@class, '$findName')]");
$name = $queryName->item(0)->nodeValue;

//Age
$findAge = "Age: ";
$queryAge = $finder->query("//div[NOT SURE WHAT TO PUT HERE]");
$age = substr($queryAge->item(0)->nodeValue, 5);
?>

Solution

  • Try

    $queryAge = $finder->query("//div[starts-with(., '$findAge')]");
    

    I've had limited success with starts-with() due to whitespace so you may have to resort to

    $queryAge = $finder->query("//div[contains(., '$findAge')]");
    

    If there's a chance of finding false positives (ie, other divs with "Age: " in them), you might be able to avoid that by using a more specific path (if known), ie

    $queryAge = $finder->query("//div[@id='something']//div[contains(., '$findAge')]");