Search code examples
html-agility-pack

HtmlAgilityPack and HtmlDecode


I am currently using HtmlAgilityPack with a console application to scrape a website. Since the html is encoded (it returns encoded characters like ') I have to decode before I save the content to my database.

Is there a way to decode the returned html using HtmlAgilityPack without having to use HttpUtility.HtmlDecode? I want to avoid adding System.Web to my console application if possible.


Solution

  • The Html Agility Pack is equiped with a utility class called HtmlEntity. It has a static method with the following signature:

    /// <summary>
    /// Replace known entities by characters.
    /// </summary>
    /// <param name="text">The source text.</param>
    /// <returns>The result text.</returns>
    public static string DeEntitize(string text)
    

    It supports well-known entities (like &nbsp;) and encoded characters such as &#039; as well.