Search code examples
regexasp.net-mvcrssorchardcms

MVC - Strip unwanted text from RSS feed


I've got the following code in my RSS consumer (Vandelay Industries RemoteRSS) in my Orchard CMS implementation:

@using System.Xml.Linq
@{
  var feed = Model.Feed as XElement;
 }
<ul>
@foreach(var item in feed
.Element("channel")
.Elements("item")
.Take((int)Model.ItemsToDisplay)) 
{
 <li>@T(item.Element("description").Value)</li>
}
</ul>

The RSS feed I'm using is from Pinterest, and this bundles the image, link, and a short description all inside the 'description' elements of the feed.

<description>&lt;a href="/pin/215609900882251703/"&gt;&lt;img src="http://media-cache-ec2.pinterest.com/upload/88664686384961121_UIyVRN8A_b.jpg"&gt;&lt;/a&gt;How to install Orchard CMS on IIS Server</description>

My issue is that I don't want the text bits, and I also need to prefix the 'href=' links with 'http://www.pinterest.com'.

I've managed to edit the original code with my beginner skills to the above, which essentially displays the images as links which are only relative and thus pointing locally to my server. These images are also then followed by the short description.

So to summarise, I need a way to prefix all links with 'http://pinterest.com' and then to remove the fee text after the image/links.


Solution

  • You should probably parse the description, with something like http://htmlagilitypack.codeplex.com/, and then tweak it to add the prefix. Or you can learn regular expression and do without a library. Could be a little trickier and error-prone however.