Search code examples
phpdomdomdocument

How to get parent and nested elements by DOMDocument?


In a typical HTML as

<ol>
   <li>
      <span>parent</span>
      <ul>
         <li><span>nested 1</span></li>
         <li><span>nested 2</span></li>
      </ul>
   </li>
</ol>

I try to get the contents of <li> elements but I need to get the parent and those nested under ul separately.

If go as

$ols = $doc->getElementsByTagName('ol');

foreach($ols as $ol){

    $lis = $ol->getElementsByTagName('li');
    // here I need li immediately under <ol>

}

$lis is all li elements including both parent and nested ones.

How can I get li elements one level under ol by ignoring deeper levels?


Solution

  • There are two approaches to this, the first is how you are working with getElementsByTagName(), the idea would be just to pick out the first <li> tag and assume that it is the correct one...

    $ols = $doc->getElementsByTagName('ol');
    
    foreach($ols as $ol){
        $lis = $ol->getElementsByTagName('li')[0];
        echo $doc->saveHTML($lis).PHP_EOL;
    }
    

    This echoes...

    <li>
          <span>parent</span>
          <ul>
    <li><span>nested 1</span></li>
             <li><span>nested 2</span></li>
          </ul>
    </li>
    

    which should work - BUT is not exact enough at times.

    The other method would be to use XPath, where you can specify the levels of the document tags you want to retrieve. This uses //ol/li, which is any <ol> tag with an immediate descendant <li> tag.

    $xp = new DOMXPath($doc);
    $lis = $xp->query("//ol/li");
    
    foreach ( $lis as $li ) {
        echo $doc->saveHTML($li);
    }
    

    this also gives...

    <li>
          <span>parent</span>
          <ul>
    <li><span>nested 1</span></li>
             <li><span>nested 2</span></li>
          </ul>
    </li>