Search code examples
phppreg-replace-callback

Replacing end div tags using preg_replace_callback function


I am trying to develop a PHP script that replaces all divs in an HTML string with paragraphs except those which have attributes (e.g. <div id="1">). The first thing my script currently does is use a simple str_replace() to replace all occurrences of <div> with <p>, and this leaves behind any div tags with attributes and end div tags (</div>). However, replacing the </div> tags with </p> tags is a bit more problematic.

So far, I have developed a preg_replace_callback function that is designed to convert some </div> tags into </p> tags to match the opening <p> tags, but ignore other </div> tags when they are ending a <div> with attributes. Below is the script that I am using;

<?php
$input = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div>";
$input2 = str_replace("<div>", "<p>", $input);
$output = preg_replace_callback("/(<div )|(<\/div>)/", 'replacer', $input2);

function replacer($matches){
    static $count = 0;
    $counter=count($matches);
    for($i=0;$i<$counter;$i++){
        if($matches[$i]=="<div "){
            return "<div ";
            $count++;
        } elseif ($matches[$i]=="</div>"){
            $count--;
            if ($count>=0){
                return "</div>";
            } elseif ($count<0){
                return "</p>";
                $count++;
            }
        }
    }
}
echo $output;
?>

The script basically puts all the remaining <div> and </div> tags into an array and then loop through it. A counter variable is then incremented when it encounters a <div> tag or decremented when it encounters a </div> within the array. When the counter is less than 0, a </p> tag is returned, otherwise a </div> is returned. The output of the script should be;

<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p>"

Instead the output I am getting is;

<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</p></p><p>I am fine.</p>

I have spent hours making as many edits to the script as I can think of, and I keep getting the same output. Can anyone explain to me where I am going wrong or offer an alternative solution?

Any help would be appreciated.


Solution

  • Next to what mario commented, comparable to phpquery or querypath, you can use the PHP DOMDocument class to search for the <div> elements in question and replace them with <p> elements.

    The cornerstones are the DOM (Document Object Model) and XPath:

    $input = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div>";
    
    $doc = new DOMDocument();
    $doc->loadHTML("<div id='body'>{$input}</div>");
    $root = $doc->getElementById('body');
    $xp = new DOMXPath($doc);
    
    $expression = './/div[not(@id)]';
    
    while($r = $xp->query($expression, $root) and $r->length)
        foreach($r as $div)
        {
            $new = $doc->createElement('p');
            foreach($div->childNodes as $child)
                $new->appendChild($child->cloneNode(1));
    
            $div->parentNode->replaceChild($new, $div);
        }
        ;
    
    $html = '';
    foreach($root->childNodes as $child)
        $html .= rtrim($doc->saveHTML($child))
        ;
    
    echo $html;
    

    This will give you:

    <p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p>