Search code examples
phpxsltutf-8domdocument

Save XML after XSL transformation using PHP


I transform an XML file into another XML file using an XSL file in a PHP page. For this, I use DOMDocument by passing the XML file and the XSL file as parameters.

The transformation works but the UTF-8 characters are replaced in the output XML file. However, my original XML file is in UTF-8 and so is my XSL sheet.

The simplexml_load_string function encodes the accents in UTF-8. But when saving the file with the saveXML function, the created file does not contain the accents. I don't understand why UTF-8 doesn't work. Do you have an idea?

Here is an example of an input XML file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Transfer xmlns="dase:v2.1" xmlns:ns2="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:id="_20220325095723763" xsi:schemaLocation="dase:v2.1 main.xsd">
    <Message>test</Message>
    <CodeList>
        <Element>villé</Element>
    </CodeList>
</Transfer>

Here is my XSL file:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.1" xmlns:dase="dase:v2.1"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ns2="http://www.w3.org/1999/xlink"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="dase:v2.1" exclude-result-prefixes="dase">

    <xsl:strip-space elements="*"/>
    <xsl:output indent="yes" method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="dase:Message">
        <xsl:comment>
            <xsl:text>New value</xsl:text>
        </xsl:comment>
    </xsl:template>

</xsl:stylesheet>

Here is my PHP code:

$xmlDoc = new DOMDocument('1.0', 'UTF-8');
$xmlDoc->formatOutput = true;
$xmlDoc->encoding = 'UTF-8';
$xmlDoc->load("./uploads/" . $fileName);
$xmlDoc->encoding = 'UTF-8';
$xslDoc = new DomDocument('1.0');

$xslDoc->load("./xslt/file.xsl");
$proc = new XSLTProcessor;

$proc->importStyleSheet($xslDoc);
$strXml = $proc->transformToXML($xmlDoc);

//echo ($proc->transformToXML($xmlDoc)); //here, the accent is fine

$convertedXML = simplexml_load_string($strXml);
$convertedXML->encoding = 'UTF-8';
//print_r($convertedXML); //here, the accent is fine

$convertedXML->encoding = 'UTF-8';
$convertedXML->saveXML("./uploads/Cleaned_" . $fileName); //the save file have accent problem

Thanks in advance


Solution

  • You can use html_entity_decode() to decode HTML entities and get back the accentued characters :

    $outputFilename = "./Cleaned_" . $fileName ;
    $output = $convertedXML->saveXML(); // get the XML file content 
    $output = html_entity_decode($output, ENT_NOQUOTES, 'UTF-8'); // decode HTML characters
    file_put_contents($outputFilename, $output); // write decoded content to disk