I've been using an xmlDataReader to read RSS for many years, but all of a sudden two feeds I've use have introduced an extra line which is tripping up the xmlDataReader parser.
The problem is the second line here conflicts with the first:
<link>http://www.eventjobsearch.co.uk/jobsrss/</link>
<atom:link href="http://www.eventjobsearch.co.uk/jobsrss/" rel="self" type="application/rss+xml"/>
The parser thinks the atom:link element is a duplicate of the link element. I don't personally need the atom:link line but as I'm using a stream, I can't see any way to remove this line or remove the colon (which would solve the problem).
How can I get rid of the colon in the stream so the built in parser works again?
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(WebConfigurationManager.AppSettings["XmlJobsFeedUrl"]);
req.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)";
WebResponse response = req.GetResponse();
Stream stream = response.GetResponseStream();
XmlTextReader xmlTextReader = new XmlTextReader(stream);
DataSet jobs = new DataSet("Jobs");
jobs.ReadXml(xmlTextReader);
Please see this question and solution. Straight before calling jobs.ReadXml(...)
, you can read the schema:
jobs.ReadXmlSchema("http://www.thearchitect.co.uk/schemas/rss-2_0.xsd");
It's probably recommended to copy the xsd file to your own server.