Remove Markdown tags from string

I have a string which has Markdown tags embedded inside it. I do not want to encode the Markdown as anything else, I just want to rip out all of the tags.

How can I do this quickly? I need to do this as part of a batch processing job which processes around 5 million pieces of text, so speed is very important.

I looked at MarkdownSharp, and using Transform, but I'm not sure it's the best way of doing this. I just want plaintext output, with no tags inside. I'm even considering a regex removal, but I'm not sure what the most performant option would be.

Solution

You could probably use MarkdownSharp or any other similar library (I recommend Strike, since it is surprisingly fast!) to convert the Markdown to Html and then use HtmlAgilityPack to extract the text.

A faster option, but more work for you, would be to modify an existing Markdown parser to produce plain text instead.

Why does an empty preprocessor command still evaluate to something?
How to implement variable sized array within C struct
Character array typecasting to integer
How can I exclude non-numeric keys? CS50 Caesar Pset2
How to get the sign, mantissa and exponent of a floating point number
Why do MCU libraries use logic operations instead of bitfield structs?
What kind of implementation can I use for a static associative array on a vintage system with very limited resources?
Determine libraries to link against for a windows library function?
Passing macro values to arm linker that places variable at a specific location
running a program with wildcards as arguments
How to perform addition of two vectors of 8-bit integers with a single addition in C/C++
GNU RISC-V Embedded GCC throws "x ISA extension `xw' must be set with the versions" error
Counting pulses using a swiss flow meter with an Arduino, how is it done?
How to create a folder in C (need to run on both Linux and Windows)
Is there any way to compute the width of an integer type at compile-time?
How can I initialize all members of an array to the same value?
Is C notably faster than C++
How to get the Windows SDK version number a program is compiling with at compile time
Confused by difference between expression inside if and expression outside if
Equivalent of atoi for unsigned integers
k&r: Exercise 1-18. Program takes input but doesnt produce any output?
Using in C thrd_sleep() to either wait for time or interrupt by signal. Example?
How can I compute `exp(x)/2` when `x` is large?
c programming: answer always equates to 0
Is it possible to access a parameter of a function from another function in C?
Will this expression evaluate to true or false (1 or 0) in C?
What Is the Return Value of strcspn() When Str1 Does not Contain Str2?
Mapping a numeric range onto another
Signalled and non-signalled state of event
Why is faster to do a branch than a lookup?