Search code examples
c#asp.net-corehtml-agility-pack

How can I read a web page with Html Agility Pack


I want to read a web page by this library but some page has a script that in first level run script ( for example Welcome or site is loading or … ) then in second level show page content. And in this page this library I access just the first level.

string linkUrl = "yoursite.come";
  var doc = new HtmlWeb().Load(linkUrl);
   var pTags = doc.DocumentNode.Descendants("p").Select(el => el.InnerText)
           .Where(u => !String.IsNullOrEmpty(u.ToString()));
// Any Code about pTags


Solution

  • As Html Agility Pack don't support JavaScript so you need to use alternative libraries that have JavaScript support.

    1. AngleSharp is a fast, extensible, and well-documented HTML parser that supports JavaScript. It is also very tolerant of malformed HTML.
    2. CsQuery is a lightweight HTML parser that uses XPath queries to select elements from an HTML document. It also supports JavaScript.
    3. dotless is a pure-.NET HTML parser that is designed to be fast and easy to use. It also supports JavaScript.

    these libraries can help you solve your issue.