How to extract text from resonably sane HTML?

My question is sort of like this question but I have more constraints:

I know the document's are reasonably sane
they are very regular (they all came from the same source
I want about 99% of the visible text
about 99% of what is viable at all is text (they are more or less RTF converted to HTML)
I don't care about formatting or even paragraph breaks.

Are there any tools set up to do this or am I better off just breaking out RegexBuddy and C#?

I'm open to command line or batch processing tools as well as C/C#/D libraries.

Solution

You need to use the HTML Agility Pack.

You probably want to find an element using LINQ ant the Descendants call, then get its InnerText.