I have a rssReader working fine on some rss feeds but I got one where I have some problems with special danish charaters - encoding I expect.
I see this encoding in the raw Http response form this url: http://www.sydvestjyllandsefterskole.dk/rss
Content-Type: text/xml; Charset=UTF-8
<?xml version="1.0" encoding="iso-8859-1" ?>
Have tried those 2 encodings and others but nothing seems to work.
I have made a unittest to show the problem and what I have tried: (NUnit)
public IEnumerable<TestCaseData> RssItemEncodingTestCases
{
get
{
yield return new TestCaseData("http://www.sydvestjyllandsefterskole.dk/rss", "Stort fremmøde til dejlig familiedag.", new ASCIIEncoding());
yield return new TestCaseData("http://www.sydvestjyllandsefterskole.dk/rss", "Stort fremmøde til dejlig familiedag.", new UTF8Encoding());
yield return new TestCaseData("http://www.sydvestjyllandsefterskole.dk/rss", "Stort fremmøde til dejlig familiedag.", new UnicodeEncoding());
yield return new TestCaseData("http://www.sydvestjyllandsefterskole.dk/rss", "Stort fremmøde til dejlig familiedag.", Encoding.GetEncoding("ISO-8859-1"));
}
}
[TestCaseSource("RssItemEncodingTestCases")]
public void TestEncoding(string url, string expectedToStartWith, Encoding encoding)
{
var description = Read(url, encoding);
Assert.That(description, Is.StringStarting(expectedToStartWith));
}
public string Read(string url, Encoding encoding = null)
{
var client = new WebClient();
if (encoding != null)
client.Encoding = encoding;
try
{
using (XmlReader reader = new XmlTextReader(client.OpenRead(url)))
{
while (reader.Read())
{
if (reader.IsStartElement() & reader.Name == "item")
{
while (reader.Read())
{
switch (reader.Name)
{
case "description":
return reader.ReadElementContentAsString();
}
if (reader.Name == "item" & reader.NodeType == XmlNodeType.EndElement)
break;
}
}
}
}
}
catch
{
}
return null;
}
Expected: String starting with "Stort fremmøde til dejlig familiedag." But was: "Stort fremmøde til dejlig familiedag.
Any idea how to get this decoded properly?
It got fixed by making them change the encoding of the RSS feed to utf-8:
<?xml version="1.0" encoding="utf-8" ?>