Search code examples
phpdomphp-5.3html-parsing

PHP DOM: parsing a HTML list into an array?


I have the below HTML string, and I would like to turn it into an array.

$string = '
<a href="#" class="something">1</a>
<a href="#" class="something">2</a>
<a href="#" class="something">3</a>
<a href="#" class="something">4</a>
';

Here's my current code with DOMDocument:

$dom = new DOMDocument;
$dom->loadHTML($string);
foreach( $dom->getElementsByTagName('a') as $node)
{
    $array[] = $node->nodeValue; 
}

print_r($array);

However, this gives the below output:

Array ( [0] => 1 [1] => 2 [2] => 2 [3] => 4)

But I am looking for this result:

Array ( 
[0] => <a href="#" class="something">1</a>
[1] => <a href="#" class="something">2</a> 
[2] => <a href="#" class="something">3</a>
[3] => <a href="#" class="something">4</a>
)

Is this possible?


Solution

  • Pass the node to DOMDocument::saveHTML to get its HTML representation:

    $string = '
    <a href="#" class="something">1</a>
    <a href="#" class="something">2</a>
    <a href="#" class="something">3</a>
    <a href="#" class="something">4</a>
    ';
    
    $dom = new DOMDocument;
    $dom->loadHTML($string);
    foreach($dom->getElementsByTagName('a') as $node)
    {
        $array[] = $dom->saveHTML($node);
    }
    
    print_r($array);
    

    Result:

    Array
    (
        [0] => <a href="#" class="something">1</a>
        [1] => <a href="#" class="something">2</a>
        [2] => <a href="#" class="something">3</a>
        [3] => <a href="#" class="something">4</a>
    )
    

    Only works with PHP 5.3.6 and higher, by the way.