I have a string which has Markdown tags embedded inside it. I do not want to encode the Markdown as anything else, I just want to rip out all of the tags.
How can I do this quickly? I need to do this as part of a batch processing job which processes around 5 million pieces of text, so speed is very important.
I looked at MarkdownSharp, and using Transform
, but I'm not sure it's the best way of doing this. I just want plaintext output, with no tags inside. I'm even considering a regex removal, but I'm not sure what the most performant option would be.
You could probably use MarkdownSharp or any other similar library (I recommend Strike, since it is surprisingly fast!) to convert the Markdown to Html and then use HtmlAgilityPack to extract the text.
A faster option, but more work for you, would be to modify an existing Markdown parser to produce plain text instead.