Search code examples
c#html.nettags

How to remove start and end html tag using C#?


If I have some html code, like:

<p>Some text</p><p>More text</p>

...and I want to remove the start and end tags of that string, so I end up with:

Some text</p><p>More text

What would the C# code look like? I want it to work with any tag type, if they have classes, etc. Just need to be able to remove the start and end tags.


Solution

  • Use Regex

    var item = "<p>Some text</p><p>More text</p>";
    item = Regex.Replace(item,@"^<[^>^<.]*>","");
    item = Regex.Replace(item,@"<[^>^<.]*>$","");
    Console.WriteLine(item) //Will log Some text</p><p>More text
    

    Regex Breakdown:

    ^: matches start of string

    <: opening tag

    >: closing tag

    [^>^<.]*: exclude closing and opening tags inside tag and match any character except the excluded ones as often as possible

    Do the same again just this time we match the end of the string with $at the end of the expression