Search code examples
html-agility-packnullreferenceexception

HtmlAgilitypack SelectSingleNode "System.NullReferenceException"


This is my code:

var html = webBrowser1.DocumentText;

            HtmlWeb web = new HtmlWeb();

            var htmlDoc = new HtmlAgilityPack.HtmlDocument();
            htmlDoc.LoadHtml(html);

            var node = htmlDoc.DocumentNode.SelectSingleNode("/html/body/div/div/div/div/section/section/div/div/div/div").Attributes["class"].Value;


            Console.WriteLine("Node Name: " + node);

So far everything works fine, but if I add a "/ div" to "SelectSingleNode" then it won't work (error message: "Exception thrown:" System.NullReferenceException ""), although there is another "div" in the HTML code there.

I think it is because in the HTML code before the next "div" there is a ":: before", but only if i analyze it in the browser

A part of the HTML code:

 <div class="un-page__body">
    <div class="container-fluid">
       ::before
    <div class="row">
       ::before
       <div class="col-sm-6">

Solution

  • When you are looking at the HTML using F12 / Dev Tools, HTML you see is very different from what you see in HtmlAgilityPack or any other web scraping tool.

    Reason

    Your code doesnt work and wont work because there are only two div tags in the entire document. /html/body/div will work because there are two of these, and thats it. Rest is just js scripts.

    When you load the URL in chrome, chrome compiles the data, executes the scripts and then present the data that it rendered to show you what you are supposed to see.

    The URL you provided only has scripts in its body that execute and generate the divs you are seeing in the Dev / Tools and at this time, HTML Agility Pack is NOT able to execute the scripts and render a compiled HTML for you to scrape through.

    What you get in HTMLAgilityPack

    When you look at the code in the doc.DocumentNode, you only see this

    <div id="app">
        WebUntis wird geladen ...
    </div>
    

    Chrome / IE will load something else because thats after compilation / rendering. What you are looking to do is to run the scripts in HTMLAgilityPack.. which is not something that you can do at this time.

    What you see in Chrome / Browser

    <div id="app">
        <div style="height: 100%;">
            <div class="un-app">
                <nav class="un-app-header navbar navbar-default">
                    <div class="container-fluid">
    ...