Search code examples
phphtmldomdomdocumentdomxpath

parsing HTML with domDocument and DOMXPath


I am getting this code into a $html variable:

...
...
<table id="tbvalue" class="table_main">
<tr align="center">
<td>
    <div style='background-color:#534522;' ><img src="operation.bmp" border="0" alt="image" width="250" height="60" /></div>
    <br />
</td>
</tr>
<tr align="center">
    <td class="other">
        more text
    </td>
</tr>
<tr align="center">
    <td>
    <input name="name" type="text" id="label" tabindex="1"/>
    </td>
</tr>
<tr>
    <td>
    <span id="lblErrCap" class="errfont"></span>
    </td>
</tr>
</table>
... 
...

note: I need that first occurrence of <img> that's inside of table id="tbvalue" I was trying to do this:

$dom = new domDocument;

/*** load the html into the object ***/
@$dom->loadHTML($html); // the @ is to silence errors and misconfigures of HTML

/*** discard white space ***/
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);

$spans = $xpath->query('//img');
echo $spans->item(0)->getAttribute("src");

But this query is ignorant of the table table id="tbvalue" and would simply take the first <img>.

What is the approach to get the first img inside table id="tbvalue" ?


Solution

  • Do it like this:

    <?php
    $xpath = new DOMXPath($dom);
    $spans = $xpath->query('//table[@id="tbvalue"]//img[1]');
    echo $spans->item(0)->getAttribute("src");
    

    // operator means to select nodes in the document from the current node that match the selection no matter where they are

    More useful information you can find here.