Search code examples
phpregexpreg-split

Preg split table all tds


I want to get all occurrences of tds in a string. At the moment im using $tds = preg_split( '#(?=<td>)#', $toDisplayNotes );

but this does not get all the tds. is it possible to produce an array that looks like this:

array {
  [0] => "<td>hello</td>"
  [1] => "<td align="right">world</td>"
  [2] => "<td>another td</td>"
}

Solution

  • Using the DOMDocument class, you can easily get all cells like so:

    $dom = new DOMDocument;
    $dom->loadHTML($htmlString);
    $cells = $dom->getElementsByTagName('td');
    $contents = array();
    foreach($cells as $cell)
    {
        $contents[] = $cell->nodeValue;
    }
    var_dump($contents);
    

    The $cells var is a DOMNodeList, so it has some methods that you might be able to use. The $cell variable will be assigned a particular instance of DOMNode on each iteration, which has all sorts of methods/properties that could be useful for your use-case, too (like getAttribute)
    Looking at your question, though, you'll be wanting the outer html (including the tags) in your array. Now that's easy:"

    $markup = array();
    foreach($cells as $cell)
    {
        $markup[] = $dom->saveXML($cell);
    }
    

    Side-note:
    Perhaps a for loop will be more performant than foreach. I haven't tested/compared the two, but you could try if you see a difference the approach above and this one:

    $markup = array();
    for($i=0, $j = $cells->length;$i<$j;$i++)
    {
        $markup[] = $dom->saveXML($cells->item($i));
    }
    

    The reason why I'm using saveXML and not saveHTML is simple: saveHTML will generate a valid DOM (including opening <html> tags and what have you). Not what you want. That's why saveXML is, in this case, the better choice.
    A slightly related question of mine here