Search code examples
phpxmlsimplexmlhtml-entitiescdata

PHP, SimpleXML, CDATA and HTML entities


I have some XML which contains CDATA.

For example the title: <title><![CDATA[School&rsquo;s Latest News]]></title>

When I parse the full XML document with simplexml_load_string, I am able to access the CDATA values using (string). So for example, I get the title:

$title = (string)$news_xml -> {'news'} -> {'title'}

The problem I have is that the ’ is not presented as a ' but instead as ’

If I use html_entity_decode, I get the exact same thing.

If I use the LIBXML_NOCDATA option when calling simplexml_load_string I am able to look at the CDATA using print_r and don't have to explicitly call (string), but my HTML entities are still coming out garbled.

Any ideas why this isn't working?


Solution

  • &rsquo; is a unicode character (value 0x8217), see also http://www.rsquo.net/

    If you send it to a browser (as I reckon you mean by presented as), make sure the encoding of the page is set to UTF-8.