Search code examples
phpdomdocument

Keeping line breaks when using PHP's DomDocument appendChild


I'm trying to use the DOMDocument in PHP to add/parse things in an HTML document. From what I could read, setting the formOutput to true and preserveWhiteSpace to false should keep the tabs and newlines in order, but it doesn't seem like it is for newly created or appended nodes.

Here's the code:

$dom = new \DOMDocument;
$dom->formatOutput = true;
$dom->preserveWhiteSpace = false;
$dom->loadHTMLFile($htmlsource);
$tables = $dom->getElementsByTagName('table');
foreach($tables as $table)
{
    $table->setAttribute('class', 'tborder');
    $div = $dom->createElement('div');
    $div->setAttribute('class', 'm2x');
    $table->parentNode->insertBefore($div, $table);
    $div->appendChild($table);
}
$dom->saveHTMLFile($html)

Here's what the HTML looks like:

<table>
    <tr>
        <td></td>
    </tr>
</table>

Here's what I want:

<div class="m2x">
    <table class="tborder">
        <tr>
            <td></td>
        </tr>
    </table>
</div>

Here's what I get:

<div class="m2x"><table class="tborder"><tr>
<td></td>
        </tr></table></div>

Is there something I'm doing wrong? I've tried googling this as many different ways as I could thing of with no luck.


Solution

  • Unfortunately, you might need to write a function that indents the output how you want it. I made a little function you might find helpful.

    function indentContent($content, $tab="\t")
    {               
    
            // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
            $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);
    
            // now indent the tags
            $token = strtok($content, "\n");
            $result = ''; // holds formatted version as it is built
            $pad = 0; // initial indent
            $matches = array(); // returns from preg_matches()
    
            // scan each line and adjust indent based on opening/closing tags
            while ($token !== false) 
            {
                    $token = trim($token);
                    // test for the various tag states
    
                    // 1. open and closing tags on same line - no change
                    if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
                    // 2. closing tag - outdent now
                    elseif (preg_match('/^<\/\w/', $token, $matches))
                    {
                            $pad--;
                            if($indent>0) $indent=0;
                    }
                    // 3. opening tag - don't pad this one, only subsequent tags
                    elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) $indent=1;
                    // 4. no indentation needed
                    else $indent = 0;
    
                    // pad the line with the required number of leading spaces
                    $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
                    $result .= $line."\n"; // add to the cumulative result, with linefeed
                    $token = strtok("\n"); // get the next token
                    $pad += $indent; // update the pad size for subsequent lines    
            }       
    
            return $result;
    }
    

    indentContent($dom->saveHTML()) will return:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html>
        <body>
            <div class="m2x">
                <table class="tborder">
                    <tr>
                        <td>
                        </td>
                    </tr>
                </table>
            </div>
        </body>
    </html>
    

    I created this function starting with this one.