Search code examples
phpxmlsimplexml

PHP SimpleXML Values returned have weird characters in place of hyphens and apostrophes


I have looked around and can't seem to find a solution so here it is.

I have the following code:

$file = "adhddrugs.xml";
$xmlstr = simplexml_load_file($file);
echo $xmlstr->report_description;

This is the simple version, but even trying this any hyphens r apostrophes are turned into: ^a (euro sign) trademark sign.

Things I have tried are:

echo = (string)$xmlstr->report_description; /* did not work */
echo = addslashes($xmlstr->report_description); /* yes I know this doesnt work with hyphens, was mainly trying to see if I could escape the apostrophes */
echo = addslashes((string)$xmlstr->report_description); /* did not work */

also htmlspecial(again i know does not work with hyphens), htmlentities, and a few other tricks.

Now the situation is I am getting the XML files from a feed so I cannot change them, but they are pretty standard. The text with the hyphens etc are encapsulated in a cdata tag and encoding is UTF-8. If I check the source I am shown the hyphens and apostrophes in the source.

Now just to see if the encoding was off or mislabeled or something else weird, I tried to view the raw XML file and sure enough it is displayed correctly.

I am sure that in my rush to find the answer I have overlooked something simple and the fact that this is really the first time I have ever used SimpleXML I am missing a very simple solution. Just don't dock me for it I really did try and find the answer on my own.

Thanks again.


Solution

  • Do you know the document's character set?

    You could do header('Content-Type: text/html; charset=utf-8'); before any content is printed, if you havent already.