I transform an XML file into another XML file using an XSL file in a PHP page. For this, I use DOMDocument by passing the XML file and the XSL file as parameters.
The transformation works but the UTF-8 characters are replaced in the output XML file. However, my original XML file is in UTF-8 and so is my XSL sheet.
The simplexml_load_string function encodes the accents in UTF-8. But when saving the file with the saveXML function, the created file does not contain the accents. I don't understand why UTF-8 doesn't work. Do you have an idea?
Here is an example of an input XML file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Transfer xmlns="dase:v2.1" xmlns:ns2="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:id="_20220325095723763" xsi:schemaLocation="dase:v2.1 main.xsd">
<Message>test</Message>
<CodeList>
<Element>villé</Element>
</CodeList>
</Transfer>
Here is my XSL file:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.1" xmlns:dase="dase:v2.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ns2="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="dase:v2.1" exclude-result-prefixes="dase">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="dase:Message">
<xsl:comment>
<xsl:text>New value</xsl:text>
</xsl:comment>
</xsl:template>
</xsl:stylesheet>
Here is my PHP code:
$xmlDoc = new DOMDocument('1.0', 'UTF-8');
$xmlDoc->formatOutput = true;
$xmlDoc->encoding = 'UTF-8';
$xmlDoc->load("./uploads/" . $fileName);
$xmlDoc->encoding = 'UTF-8';
$xslDoc = new DomDocument('1.0');
$xslDoc->load("./xslt/file.xsl");
$proc = new XSLTProcessor;
$proc->importStyleSheet($xslDoc);
$strXml = $proc->transformToXML($xmlDoc);
//echo ($proc->transformToXML($xmlDoc)); //here, the accent is fine
$convertedXML = simplexml_load_string($strXml);
$convertedXML->encoding = 'UTF-8';
//print_r($convertedXML); //here, the accent is fine
$convertedXML->encoding = 'UTF-8';
$convertedXML->saveXML("./uploads/Cleaned_" . $fileName); //the save file have accent problem
Thanks in advance
You can use html_entity_decode()
to decode HTML entities and get back the accentued characters :
$outputFilename = "./Cleaned_" . $fileName ;
$output = $convertedXML->saveXML(); // get the XML file content
$output = html_entity_decode($output, ENT_NOQUOTES, 'UTF-8'); // decode HTML characters
file_put_contents($outputFilename, $output); // write decoded content to disk