$html ='<html>
<head>
<title></title>
</head>
<body>
<div class="">
<div class="">
<p><strong><span style="color:#FF0000"> Content1 </span></strong></p>
<p style="text-align:center"> Content2 <img src="https://example.com/bla1.jpg"/></p>
</div>
<h2> Header </h2>
<div class=""><p><strong> Content3 </strong></p> </div>
</div>
<div class=""> Content4 </div>
<div class="">
<p> Content5 </p>
<p> Content6 </p>
<span> blah.. </span>
</div>
</body></html>';
I need to have such an array:
This means whether each DIV (including P) has a child or parent DIV ?
Yours is a nice attempt but I would rather prefer to get all p
tags and then climb up the DOM node hierarchy if div
is a parent of the current p
node. This way, you would only collect those p
nodes which has div
as their parent and not otherwise. In other words, it is like the CSS selector div > p
.
$ps = array();
$doc = new DomDocument('1.0', 'UTF-8');
$doc->loadHTML(mb_convert_encoding($HTML, 'HTML-ENTITIES', 'UTF-8'));
foreach($doc->getElementsByTagName('p') as $p){
$curr_node = $p->parentNode;
while(property_exists($curr_node,'tagName')){
if($curr_node->tagName == 'div'){
$ps[] = $p;
break;
}
$curr_node = $curr_node->parentNode;
if($curr_node === null) break;
}
}
print_r($ps);
Update #1:
To get p
s per div
, you can recursively walk through all child nodes per div
and collect all p
s and add it to result as below:
function getPs($node,&$result){
foreach ($node->childNodes as $c_node) {
if(property_exists($c_node, 'tagName') && $c_node->tagName == 'p'){
$result[] = $c_node;
}
getPs($c_node,$result);
}
}
$ps = [];
foreach($doc->getElementsByTagName('div') as $div){
$child_ps = [];
getPs($div,$child_ps);
if(count($child_ps) > 0) $ps[] = $child_ps;
}
echo "<pre>";
print_r($ps);
Update #2:
To get the HTML string representation of the p
node, change
$result[] = $c_node;
to
$result[] = $c_node->ownerDocument->saveXML( $c_node );