php dom domdocument getelementsbytagname

How To get DiVs Level?

$html ='<html>
<head>
    <title></title>
</head>
<body>
    <div class="">
        <div class="">
           <p><strong><span style="color:#FF0000"> Content1 </span></strong></p>
           <p style="text-align:center"> Content2 <img src="https://example.com/bla1.jpg"/></p>
        </div>
       
        <h2> Header </h2>
        <div class=""><p><strong> Content3 </strong></p> </div>

    </div>

    <div class=""> Content4 </div>
    <div class="">
                   <p> Content5 </p>  
                   <p> Content6 </p> 
                   <span> blah.. </span>
    </div>
</body></html>';

I need to have such an array:

This means whether each DIV (including P) has a child or parent DIV ?

Solution

Yours is a nice attempt but I would rather prefer to get all p tags and then climb up the DOM node hierarchy if div is a parent of the current p node. This way, you would only collect those p nodes which has div as their parent and not otherwise. In other words, it is like the CSS selector div > p.

$ps = array();
$doc = new DomDocument('1.0', 'UTF-8');
$doc->loadHTML(mb_convert_encoding($HTML, 'HTML-ENTITIES', 'UTF-8'));

foreach($doc->getElementsByTagName('p') as $p){
   $curr_node = $p->parentNode;
   while(property_exists($curr_node,'tagName')){
      if($curr_node->tagName == 'div'){
        $ps[] = $p;
        break;
      }
      $curr_node = $curr_node->parentNode;
      if($curr_node === null) break;
   }
}

print_r($ps);

Update #1:

To get ps per div, you can recursively walk through all child nodes per div and collect all ps and add it to result as below:

function getPs($node,&$result){
    foreach ($node->childNodes as $c_node) {
        if(property_exists($c_node, 'tagName') && $c_node->tagName == 'p'){
            $result[] = $c_node;
        }
        getPs($c_node,$result);
    }
}

$ps = [];

foreach($doc->getElementsByTagName('div') as $div){
   $child_ps = [];
   getPs($div,$child_ps);
   if(count($child_ps) > 0) $ps[] = $child_ps;
}

echo "<pre>";
print_r($ps);

Update #2:

To get the HTML string representation of the p node, change

$result[] = $c_node;

$result[] = $c_node->ownerDocument->saveXML( $c_node );