I am trying to parse the yahoo answers feed - http://answers.yahoo.com/rss/allq The issue is that the titles have
[ Category ] : Open Question :
in every title that I do not want... I want to write a regexp to remove this...
anything that we can make to remove all the letters in the starting [ and the first : should do it.
there is a space after the :
also, we need to remove that too.
Thanks for this in advance, I will also try to find a solution myself.
Have you considered using Yahoo's YQL service to parse this feed (or other web pages)?
They already have sample queries for you to get at Yahoo Answers data:
answers.getbycategory: http://developer.yahoo.com/yql/console/#h=select%20*%20from%20answers.getbycategory%20where%20category_id%3D2115500137%20and%20type%3D%22resolved%22
answers.getbyuser: http://developer.yahoo.com/yql/console/#h=select%20*%20from%20answers.getbyuser%20where%20user_id%3D%22YbaMGtHFaa%22
answers.getquestion: http://developer.yahoo.com/yql/console/#h=select%20*%20from%20answers.getquestion%20where%20question_id%3D%2220090526102023AAkRbch%22
answers.search: http://developer.yahoo.com/yql/console/#h=select%20*%20from%20answers.search%20where%20query%3D%22cars%22%20and%20category_id%3D2115500137%20and%20type%3D%22resolved%22
(Just an FYI in case you weren't aware of this convenient service. I use it instead of screen scraping with RegEx's.)