Search code examples
phphtmlxmlrssatom-feed

Re-escape characters in an XML file


Consider the following XML structure (in this case, its an RSS feed)

<feed xmlns="http://www.w3.org/2005/Atom">
<link href="http://example.com/atom/" rel="self" type="application/rss+xml"/>
<link rel="alternate" href="http://example.com/" type="text/html"/>
<title type="text">Example RSS feed</title>
<updated>2019-07-27T13:59:14-04:00</updated>
<subtitle>Example</subtitle>
<icon>http://example.com/favicon-32x32.png</icon>
<logo>http://example.com/logo.png</logo>
<rights>© 2019 Example</rights>
<author>
<name>Keanu Reeves</name>
<email>[email protected]</email>
<uri>http://example.com</uri>
</author>
<id>http://example.com/</id>
<entry>
<title>Example post</title>
<id>http://example.com/post/example</id>
<link rel="alternate" href="http://example.com/post/example"/>
<summary type="html">
Description of post. (Preview thing)
</summary>
<updated>2019-07-27T13:59:14-04:00</updated>
<author>
<name>Keanu Reeves</name>
</author>
</entry>
</feed>

If saved as an .atom file, this works flawlessly.

Tho, Id like to include the following in my post summary:

Example text, blah blah blah. <a href="/post/example">Read more...</a>
The above links get interpreted as litteral HTML when escaped correctly using the function under this code snippet. Good!
Now, heres litteral "<" and ">" characters.... <><><<<>>

The last line I want to include renders the .atom file invalid, obviously. So I encoded that last line to be XML compliant using the following PHP function:

echo htmlentities("Now, heres litteral \"<\" and \">\" characters.... <><><<<>>",ENT_XML1);

That outputted the following bit of text:

Now, heres litteral "&lt;" and "&gt;" characters.... &lt;&gt;&lt;&gt;&lt;&lt;&lt;&gt;&gt;

But now, all of my feed readers (Slick RSS for chrome and FeedR for android) interprets the above as literal HTML!

So how can I re-escape those?

Cheers :)


Solution

  • Because when the XML document is parsed the contents of that field still contain literal < and > [and likely other] metacharacters.

    // the literal string you want to encode.
    $string1 = "Now, heres litteral \"<\" and \">\" characters.... <><><<<>>";
    
    // oops but I want to make sure I don't accidentally pass in HTML to RSS readers that might
    // accidentally try to render it.
    $string2 = htmlentities($string1);
    
    // oh also I am writing XML directly instead of using a proper library to generate the document.
    // I know that this is a really bad idea, but I'm sure I have my reasons.
    // anywho, I should escape this text to be kludged directly into an XML doc.
    $string3 = htmlentities($string2, ENT_XML1);
    
    var_dump($string1, $string2, $string3);
    

    Output:

    string(56) "Now, heres litteral "<" and ">" characters.... <><><<<>>"
    string(109) "Now, heres litteral &quot;&lt;&quot; and &quot;&gt;&quot; characters.... &lt;&gt;&lt;&gt;&lt;&lt;&lt;&gt;&gt;"
    string(169) "Now, heres litteral &amp;quot;&amp;lt;&amp;quot; and &amp;quot;&amp;gt;&amp;quot; characters.... &amp;lt;&amp;gt;&amp;lt;&amp;gt;&amp;lt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;"
    

    $string2 should be as encoded as is necessary if you were feeding the data into something like an XMLDocument, DomDocument, or similar object, but since it look like you're doing things the hard way you're going to have to go all the way to $string3.