I am loading a XML-compliant PHP file into DOMDocument.
$domDoc = new DOMDocument();
$domDoc->recover = TRUE;
$domDoc->preserveWhiteSpace = TRUE;
$domDoc->formatOutput = FALSE;
$domDoc->substituteEntities = FALSE;
$domDoc->resolveExternals = FALSE;
Despite preserving whitespace and instructing it to not format the output, I am still finding the leading whitespace in <?php ?>
blocks removed when I save the XML with $domDoc->saveXML()
.
Input:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<?php
// This is code.
// Something else.
echo 'test';
?>
</html>
Output:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<?php // This is code.
// Something else.
echo 'test';
?>
</html>
I want the output to be as identical to the input as possible. Collapsing whitespace between attributes is acceptable, but collapsing whitespace between nodes or within a Processing Instruction is not okay. Why is PHP::DOMDocument() / libxml2 changing the contents of the PI? Will I need to resort to manual DOM echoing to keep the whitespace completely preserved?
Leading white space in a PI node is actually okay to collapse, as the DOM considers the data portion of a processing instruction to be:
The content of this processing instruction. This is from the first non white space character after the target to the character immediately preceding the ?>.
(Emphasis mine.)
The preserveWhiteSpace
setting only applies to text nodes, which is why that doesn't help you here.
In any case I would advise not relying on embedded PHP to be treated as a processing instruction as PHP can contain ?>
within it (e.g. as part of a string literal) which would terminate the processing instruction early.