Search code examples
phphtmldomdocument

PHP convert divs to custom tags


I'm trying to convert some html tags to custom tags using PHP. I've been trying to use DOMDocument but finding it to be incredibly cumbersome. Is there a simple way to do this in PHP / DOMDocument?

Input:

<div class="element_wrapper">
    <div class="element_header">My header</div>
    <div class="element">
        <div class="name">Element Name</div>
    </div>
</div>

Desired Output:

<element_wrapper>
    <element_header>My Header</element_header>
    <element>
        <name>Element Name</name>
    </element>
</element_wrapper>

My first approach (incomplete, added per AndrewL64's request):

<?php

$templates = Repository::fetchTemplates();

$classes = [
    'element_wrapper',
    'element',
    'name',
    'element_header',
];

foreach ($templates as $template) {
    $html = '<div>' . $template['html_body'] . '</div>';
    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $finder = new DOMXPath($dom);
    foreach ($classes as $class) {
        $div_nodes = $finder->query("//div[@class='$class']");
        /** @var DOMNode $div_node */
        foreach ($div_nodes as $div_node) {

            /** @var DOMElement $custom_tag */
            $custom_tag = $dom->createElement($class, $div_node->nodeValue);
            if ($div_node->hasAttributes()) {
                foreach ($div_node->attributes as $attribute) {
                    if ($attribute->nodeValue === $class) {
                        continue;
                    }
                    $custom_tag->setAttributeNode($attribute);
                }
            }
            $div_node->parentNode->replaceChild($custom_tag, $div_node);
        }
    }
}

Many thanks in advance!


Solution

  • In the end I used preg_replace and multiple DOMDocument instances to make the changes to the html. Using purely DOMDocument there is a mess of recursion and rebuilding that you need to do which is hard to keep track of and feels awfully error prone. My solution follows:

    <?php
    
    $templates = TemplateRepository::fetchAll();
    
    $classes = [
        'element_wrapper',
        'element',
        'name',
        'element_header',
    ];
    
    
    foreach ($templates as $template) {
        // We need to guarantee a root element for DOMDocument to be happy. (strip later)
        $html = '<div>' . $template['html_body'] . '</div>';
    
        $dom = new DOMDocument();
        $dom->loadHTML($html);
    
        $finder = new DOMXPath($dom);
    
        $class_found = false; // track if we found a class / will have changes.
        foreach ($classes as $class) {
            $div_nodes = $finder->query("//div[contains(@class,'$class')]");
            /** @var DOMNode $div_node */
            foreach ($div_nodes as $div_node) {
                $class_found = true;
    
                $content = $dom->saveHTML($div_node);
    
                // I know that the class I want to turn into a custom tag will come after the div opener, so replace that with the class.
                $content = preg_replace('@^<div class="' . $class . '([^>]+)>@', '<' . $class . ' class="\1>', $content);
    
                // Clean up empty class attribute...just cuz.
                $content = preg_replace("@<$class class=\"\s*\"@", "<$class", $content);
    
                // Replace closing div with closing custom tag.  We can assume the end </div> is our target because DOMDocument did the heavy lifting.
                $content = preg_replace('@</div>$@', "</$class>", $content);
    
                // Create a new dom document from our new html string.  We need this to create a DOMNode that we can import into our original.
                $dom_element = new DOMDocument();
                $dom_element->loadHTML($content);
    
                // We only want the original html, so just grab the first child of the body.
                $node = $dom_element->getElementsByTagName('body')[0]->firstChild;
    
                // Import the new node into our original document so we can use it to replace our <div> version.
                $node = $dom->importNode($node, true);
    
                // Replace our original.
                $div_node->parentNode->replaceChild($node, $div_node);
            }
        }
    
        // Get the final updated html.
        $new_body = $dom->saveHTML($dom->getElementsByTagName('body')[0]->firstChild);
    
        // And finish by stripping off our wrapper div we added at the start.
        $new_body = preg_replace('@^<div>(.*)</div>@', '\1', $new_body);
    }