I’m using the SyndicationFeed class to consume some rss feeds for articles. I wonder how to get only the text from the item's Summary field, without the html tags. for example, sometimes (not always) it contains html tags such as: div, img, h, p tags:/a>/div> ,img src='http"
I want to get rid of all tags. Also, I'm not sure it brings the full description within the RSS feed.
Should I use regular expression for this matter? other methods?
XmlReader reader = XmlReader.Create(response.GetResponseStream());
SyndicationFeed feed = SyndicationFeed.Load(reader);
foreach (SyndicationItem item in feed.Items)
{
string description= item.Summary; //This contains tags and not only the article text
}
Yeah I suppose regexes are the easiest built-in way to achieve this...
// Get rid of the tags
description = Regex.Replace(description, @"<.+?>", String.Empty);
// Then decode the HTML entities
description = WebUtility.HtmlDecode(description);