A software generates me a Windows-1252 XML file, and I would like to parse it in PHP, and send the data on my database in UTF8.
I tried a lot of solutions, such as iconv or utf8_encode functions, but no result.
It displays things like €
, but not just €
...
My XML file is like this :
<?xml version="1.0" encodoing="Windows-1252" standalone="yes"?>
<node>The price is 12 € !</node>
€
seems to be the code of € (euro) in Windows-1252.
I tried these functions :
<!doctype html>
<html lang='fr'>
<head>
<meta charset='UTF-8'>
</head>
<body>
<?php
// XML Loading in DOM Document
// Parsing XML Node
/* Not working */
$node = iconv('Windows-1252', 'UTF-8', $nodeValue);
/* Not working */
$node = utf8_encode($nodeValue);
?>
</body>
</html>
As shown in this Stack Overflow question the Euro symbol is converted to the latin-1 supplement euro character, and not the "proper" UTF-8 codepoint. A workaround for it is to utf8_decode
and then "re-encode" again:
$node = iconv('Windows-1252', 'UTF-8', utf8_decode($node));
So some sample code that works:
<?php
$xml = '<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
<node>The price is 12 € !</node>';
$doc = new DomDocument();
$doc->loadXML($xml);
$nodes = $doc->getElementsByTagName('node');
$node = iconv('Windows-1252', 'UTF-8', utf8_decode($nodes[0]->nodeValue));
echo $node;