Search code examples
phputf-8simplexmlphp-5.4symfony-2.7

PHP: simplexml_load_file gets strange characters from an XML file with UTF-8 encoding


The simplexml_load_file() function doesn't parse the accent characters well. The file is UTF-8 encoded, the xml tag has encoding="UTF-8".

I'm importing an XML file encoded in UTF-8 with simplexml_load_file() function. This file has some accent characters, and when I do a print_r() or var_dump() the accent characters are converted to strange characters.

First line in XML file is

<?xml version="1.0" encoding="UTF-8"?>

In code I'm running the basic

$xFile = simplexml_load_file($xmlFile)

I'm looping through the SimpleXML Object and fetching the word with accent characters like so

$text = (string)$p->i

Now

var_dump($text);

shows Ge├»rriteerd instead of Geïrriteerd

I've tried to get_file_contents() and then simplexml_load_string() and I've also tried to load the XML file with DOMDocument, but the same 'wild' characters are being displayed.

Any thoughts on what else could I do?

Note: I'm working on PHP5.4, that's the PROD version and I can't change it.


Solution

  • The issue was a windows console default encoding. I've changed the encoding to UTF-8 by running chcp 65001.

    @Phil's comment was helpful.