Search code examples
phpencodingutf-8windows-1252

How to convert euro (€) symbol from Windows-1252 to UTF-8?


A software generates me a Windows-1252 XML file, and I would like to parse it in PHP, and send the data on my database in UTF8.

I tried a lot of solutions, such as iconv or utf8_encode functions, but no result.

It displays things like €, but not just ...

My XML file is like this :

<?xml version="1.0" encodoing="Windows-1252" standalone="yes"?>
    <node>The price is 12 &#128; !</node>

&#128; seems to be the code of € (euro) in Windows-1252.

I tried these functions :

<!doctype html>
<html lang='fr'>
    <head>
        <meta charset='UTF-8'>
    </head>

    <body>

<?php
    // XML Loading in DOM Document
    // Parsing XML Node

    /* Not working */
    $node = iconv('Windows-1252', 'UTF-8', $nodeValue);

    /* Not working */
    $node = utf8_encode($nodeValue);
?>

    </body>
</html>

Solution

  • As shown in this Stack Overflow question the Euro symbol is converted to the latin-1 supplement euro character, and not the "proper" UTF-8 codepoint. A workaround for it is to utf8_decode and then "re-encode" again: $node = iconv('Windows-1252', 'UTF-8', utf8_decode($node));

    So some sample code that works:

    <?php
    $xml = '<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
        <node>The price is 12 &#128; !</node>';
    
    $doc = new DomDocument();
    $doc->loadXML($xml);
    $nodes = $doc->getElementsByTagName('node');
    $node = iconv('Windows-1252', 'UTF-8', utf8_decode($nodes[0]->nodeValue));
    echo $node;