Search code examples
phphtmlxpathdomxpath

PHP's DomXPath not working the way it was expected


I'm trying to parse this HTML page: http://www.valor.com.br/valor-data/moedas

For a simple start, I'm trying to get all td elements with class="left" and echoing their inner texts. What I'm struggling to understand is why this code:

    $finder = new DomXPath($dom);
    $tds = $finder->query("//*[@class='left']");
    foreach ($tds as $td) {
        echo $td->textContent;
    }

gives me the expected output (a bunch of words that belong to those td elements which aren't worth pasting here) while this:

    $finder = new DomXPath($dom);
    $tds = $finder->query("//td[@class='left']");
    foreach ($tds as $td) {
        echo $td->textContent;
    }

finds nothing. I've also tried $finder->query("//td") to simply get all td elements, but it's like DomXPath doesn't recognize tag names. Has anyone ever faced this same problem?


Solution

  • I have not tested, but this is probably a namespace issue. Your input page is XHTML and has correctly declared an XHTML namespace. Therefore, you need to register a namespace prefix and use that prefix in your query.

    Something like this

    $finder = new DomXPath($dom);
    $finder->registerNamespace("x", "http://www.w3.org/1999/xhtml");
    $tds = $finder->query("//x:td[@class='left']");
    foreach ($tds as $td) {
        echo $td->textContent;
    }