Search code examples
wordpressrssfeed

How to fix invalid WordPress feed caused by quote?


I have a WordPress site with custom taxonomies. I send newsletters automatically with Mailchimp for each taxonomy feed. Most feeds work, but those for which there is a quote in the title are invalid.

For example, you can see this feed which title is "Val d'Oise" is invalid : https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.verdi-immobilier.com%2Fdepartements%2F95-val-doise%2Ffeed%2F.

It returns the error XML parsing error: <unknown>:11:24: undefined entity. After testing, it's actually the quote which causes problem.

Here is the feed:

    <?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    >

<channel>
    <title>95 &#8211; Val-d&rsquo;oise &#8211; Verdi Immo</title>
    <atom:link href="https://www.verdi-immobilier.com/departements/95-val-doise/feed/" rel="self" type="application/rss+xml" />
    <link>https://www.verdi-immobilier.com</link>
    <description>Le dernier recours des propriétaires</description>
    <lastBuildDate>2019-11-01 06:24:28</lastBuildDate>
    <language>fr-FR</language>
    <sy:updatePeriod>
    hourly  </sy:updatePeriod>
    <sy:updateFrequency>
    1   </sy:updateFrequency>
    
<image>
    <url>https://www.verdi-immobilier.com/wp-content/uploads/2019/09/cropped-logo-ico-32x32.png</url>
    <title>95 &#8211; Val-d&rsquo;oise &#8211; Verdi Immo</title>
    <link>https://www.verdi-immobilier.com</link>
    <width>32</width>
    <height>32</height>
</image> 
</channel>
</rss>

The ’ does not seem to be interpreted. Do you guys know how to fix it?


Solution

  • Wrong answer: This is not a quote:

    &#8211;
    

    It´s converted to a dash by wordpress

    https://en.wikipedia.org/wiki/Dash

    And a dash is not an UTF-8 character. Try this encoding:

    <?xml version="1.0" encoding="UTF-16"?>
    

    Edit: Right Answer: You are right, the problem ist the ’ - which is invalid.

    Can you try to replace the ’ in the title of your post to

    &#8217; 
    

    (which is valid and the same character) On the Frontend the ’ is shown and i hope a valid encoded character in the xml output also.

    replace:

    Val-d’oise
    

    with:

    Val-d&#8217;oise
    

    in the post-title.

    It is dirty, but I hope this helps. I think WordPress had a similar bug years ago.

    Regards Tom