Search code examples
phpxpathdomdocumentdomxpath

Add content of a table into an array using DOMDocument


I have $html:

$html = '
<table id="myTable">
    <tbody>
        <tr>
            <td>08/20/18</td>
            <td> <a href="https://example.com/1a">Text 1 A</a> </td>
            <td> <a href="https://example.com/1b">Test 1 B</a> </td>
        </tr>
        <tr>
            <td>08/21/18</td>
            <td> <a href="https://example.com/2a">Text 2 A</a> </td>
            <td> <a href="https://example.com/2b">Test 2 B</a> </td>
        </tr>
    </tbody>
</table>
';

Using DOMDocument, I want to add the content of the table into a multidimensional $array:

$array = array(
    // tr 1
    array(
        array(
            'content' => '08/20/18'
        ),
        array(
            'content' => 'Text 1 A',
            'href' => 'https://example.com/1a'
        ),
        array(
            'content' => 'Text 1 B',
            'href' => 'https://example.com/1b'
        )
    ),
    // tr 2
    array(
        array(
            'content' => '08/21/18'
        ),
        array(
            'content' => 'Text 2 A',
            'href' => 'https://example.com/1a'
        ),
        array(
            'content' => 'Text 2 B',
            'href' => 'https://example.com/1b'
        )
    )
);

What I've tried so far

I've managed to get the content of the table using xpath:

// setup DOMDocument
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html); 
$xpath = new DOMXPath($doc);
// target table using xpath
$results = $xpath->query("//*[@id='myTable']");

if ($results->length > 0) {
    var_dump($results->item(0));
    var_dump($results->item(0)->nodeValue);
}

Test it. What is the approach to put the content of each tr into the $array?


Solution

  • <?php
    
    $html = '
    <table id="myTable">
        <tbody>
            <tr>
                <td>08/20/18</td>
                <td> <a href="https://example.com/1a">Text 1 A</a> </td>
                <td> <a href="https://example.com/1b">Test 1 B</a> </td>
            </tr>
            <tr>
                <td>08/21/18</td>
                <td> <a href="https://example.com/2a">Text 2 A</a> </td>
                <td> <a href="https://example.com/2b">Test 2 B</a> </td>
            </tr>
        </tbody>
    </table>
    ';
    
    $data = [];
    
    $doc = new DOMDocument();
    $doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
    $xpath = new DOMXPath($doc);
    $trs = $xpath->query("//*[@id='myTable']/tbody/tr");
    foreach ($trs as $i => $tr) {
        /** @var DOMElement $td */
        foreach ($tr->childNodes as $td) {
            if ($td instanceof DOMElement) {
                /** @var DOMElement $a */
                $row = [];
                foreach ($td->childNodes as $a) {
                    /** @var DOMAttr $attribute */
                    $row['content'] = $td->nodeValue;
                    if ($a->hasAttributes()) {
                        foreach ($a->attributes as $attribute) {
                            $row[$attribute->name] = $attribute->value;
                        }
    
                    }
    
                }
                $data[$i][] = $row;
            }
        }
    }
    
    var_dump($data);