I am trying to develop a PHP script that replaces all divs in an HTML string with paragraphs except those which have attributes (e.g. <div id="1">
). The first thing my script currently does is use a simple str_replace() to replace all occurrences of <div>
with <p>
, and this leaves behind any div tags with attributes and end div tags (</div>
). However, replacing the </div>
tags with </p>
tags is a bit more problematic.
So far, I have developed a preg_replace_callback function that is designed to convert some </div>
tags into </p>
tags to match the opening <p>
tags, but ignore other </div>
tags when they are ending a <div>
with attributes. Below is the script that I am using;
<?php
$input = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div>";
$input2 = str_replace("<div>", "<p>", $input);
$output = preg_replace_callback("/(<div )|(<\/div>)/", 'replacer', $input2);
function replacer($matches){
static $count = 0;
$counter=count($matches);
for($i=0;$i<$counter;$i++){
if($matches[$i]=="<div "){
return "<div ";
$count++;
} elseif ($matches[$i]=="</div>"){
$count--;
if ($count>=0){
return "</div>";
} elseif ($count<0){
return "</p>";
$count++;
}
}
}
}
echo $output;
?>
The script basically puts all the remaining <div>
and </div>
tags into an array and then loop through it. A counter variable is then incremented when it encounters a <div>
tag or decremented when it encounters a </div>
within the array. When the counter is less than 0, a </p>
tag is returned, otherwise a </div>
is returned.
The output of the script should be;
<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p>"
Instead the output I am getting is;
<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</p></p><p>I am fine.</p>
I have spent hours making as many edits to the script as I can think of, and I keep getting the same output. Can anyone explain to me where I am going wrong or offer an alternative solution?
Any help would be appreciated.
Next to what mario commented, comparable to phpquery or querypath, you can use the PHP DOMDocument
class to search for the <div>
elements in question and replace them with <p>
elements.
The cornerstones are the DOM (Document Object Model) and XPath:
$input = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div>";
$doc = new DOMDocument();
$doc->loadHTML("<div id='body'>{$input}</div>");
$root = $doc->getElementById('body');
$xp = new DOMXPath($doc);
$expression = './/div[not(@id)]';
while($r = $xp->query($expression, $root) and $r->length)
foreach($r as $div)
{
$new = $doc->createElement('p');
foreach($div->childNodes as $child)
$new->appendChild($child->cloneNode(1));
$div->parentNode->replaceChild($new, $div);
}
;
$html = '';
foreach($root->childNodes as $child)
$html .= rtrim($doc->saveHTML($child))
;
echo $html;
This will give you:
<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p>