Search code examples
c#html-encodepattern-recognition

Recognize pattern to extract words from C# HTML Encoded String


I am looking for some help in recognizing pattern from a string that is HTML Encoded.

If I have an HTML Encoded string like:

string strHTMLText=@"<p>Pellentesque habitant [[@Code1]] morbi tristique senectus [[@Code2]] et netus et malesuada fames ac [[@Code3]] turpis egestas.</p>"

I need to extract the words [[@Code1]], [@Code2], [[@Code3]], that is dynamic and their count is unknown. These words has been used to substitute other values in the provided HTML Text.

I want to recognize the pattern [[@something]] and populate all the occurrence in an array etc, so that I can process these values to fetch the relevant value from the database later.


Solution

  • string strHTMLText=@"<p>Pellentesque habitant [[@Code1]] morbi tristique senectus [[@Code2]] et netus et malesuada fames ac [[@Code3]] turpis egestas.</p>";
    var input = HttpUtility.HtmlDecode(strHTMLText);
    var list = Regex.Matches(input, @"\[\[@(.+?)\]\]")
        .Cast<Match>()
        .Select(m => m.Groups[1].Value)
        .ToList();