Search code examples
phploopshtml-tabledomdocument

How to use DOMDocument to get child elements?


I am trying to get the text of child elements using the PHP DOM.

Specifically, I am trying to get only the first <a> tag within every <tr>.

The HTML is like this...

<table>
<tbody>
    <tr>
        <td>
            <a href="#">1st Link</a>
        </td>
        <td>
            <a href="">2nd Link</a>
        </td>
        <td>
            <a href="#">3rd Link</a>
        </td>
    </tr>

    <tr>
        <td>
            <a href="#">1st Link</a>
        </td>
        <td>
            <a href="#">2nd Link</a>
        </td>
        <td>
            <a href="#">3rd Link</a>
        </td>
    </tr>
</tbody>
</table>

My sad attempt at it involved using foreach() loops, but would only return Array() when doing a print_r() on the $aVal.

$dom = new DOMDocument();
libxml_use_internal_errors(true);       
$dom->loadHTML(returnURLData($url));
libxml_use_internal_errors(false);
    
$tables = $dom->getElementsByTagName('table');
$aVal = array();

foreach ($tables as $table) {
    foreach ($table as $tr){
        $trVal = $tr->getElementsByTagName('tr');
        foreach ($trVal as $td){
            $tdVal = $td->getElementsByTagName('td');
            foreach($tdVal as $a){
                $aVal[] = $a->getElementsByTagName('a')->nodeValue;
            }
        }
    }
}

Am I on the right track or am I completely off?


Solution

  • Put this code in test.php

    require 'simple_html_dom.php';
    $html = file_get_html('test1.php');
    foreach($html->find('table tr') as $element)
    {
        foreach($element->find('a',0) as $element)
        {
            echo $element->plaintext;
        }
    }
    

    and put your html code in test1.php

    <table>
        <tbody>
            <tr>
                <td>
                    <a href="#">1st Link</a>
                </td>
                <td>
                    <a href="">2nd Link</a>
                </td>
                <td>
                    <a href="#">3rd Link</a>
                </td>
            </tr>
    
            <tr>
                <td>
                    <a href="#">1st Link</a>
                </td>
                <td>
                    <a href="#">2nd Link</a>
                </td>
                <td>
                    <a href="#">3rd Link</a>
                </td>
            </tr>
        </tbody>
    </table>