Search code examples
c#vb.nethtml-agility-packhtml-entitiesurldecode

Decoding entire HTML entities at once


I want to decode HTML or texts. I used -with the same result- this functions:

  • HtmlEntity.DeEntitize
  • HttpUtility.HtmlDecode
  • WebUtility.HtmlDecode

For example, when I like to decode Martian's atmosphere, I get Martian's atmosphere instead of Martian's atmosphere.

And When I use this code (for exp), all is right (characters are decoded):

    TextBox1.Text = "Martian's atmosphere"
    For i = 0 To 2
        TextBox1.Text = WebUtility.HtmlDecode(TextBox1.Text)
        i += 1
    Next

The problem is I don't like to use loops, because sometimes I have to decode a full HTML page or long texts.


Solution

  • It sounds like you don't have any way of knowing in advance how many times a string will need to be decoded until you get the result you want, so you're going to have to use either a loop or recursion to get the desired result. Here's a recursive function to do it:

    function DecodeUntilUnchanged(string str)
    {
        string decoded = WebUtility.HtmlDecode(str);
        if(decoded == str)
           return str;
        return DecodeUntilUnchanged(decoded);
    }
    

    You'd use it like this:

    TextBox1.Text = DecodeUntilUnchanged(TextBox1.Text);