Search code examples
c#parsingxpathhtml-agility-packnullreferenceexception

Html Agility Pack Xpath not working


so when I'm trying to do is parse a HTML document using Html Agility Pack. I load the html doc and it works. The issue lies when I try to parse it using XPath. I get a "System.NullReferenceException: 'Object reference not set to an instance of an object.'" Error.

To get my xpath I use the Chrome Development window and highlight the whole table that has the rows which contains the data that I want to parse, right click it and copy Xpath.

Here's my code

string url = "https://www.ctbiglist.com/index.asp";
        string myPara = "LastName=Smith&FirstName=James&PropertyID=&Submit=Search+Properties";
        string htmlResult;

        // Get the raw HTML from the website
        using (WebClient client = new WebClient())
        {
            client.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";

            // Send in the link along with the FirstName, LastName, and Submit POST request
            htmlResult = client.UploadString(url, myPara);

            //Console.WriteLine(htmlResult);
        }

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(htmlResult);


        HtmlNodeCollection table = doc.DocumentNode.SelectNodes("//*[@id=\"Table2\"]/tbody/tr[2]/td/table/tbody/tr/td/div[2]/table/tbody/tr[2]/td/table/tbody/tr[2]/td/form/div/table[1]/tbody/tr");

        Console.WriteLine(table.Count);

When I run this code it works but grabs all the tables in the HTML document.

var query = from table in doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
        from row in table.SelectNodes("//tr").Cast<HtmlNode>()
        from cell in row.SelectNodes("//th|td").Cast<HtmlNode>()
        select new { Table = table.Id, CellText = cell.InnerText };

foreach (var cell in query)
{
     Console.WriteLine("{0}: {1}", cell.Table, cell.CellText);
}

What I want is a specific table that holds all the tables rows that has the data I want to parse into objects.

Thanks for the help!!!


Solution

  • Change the line

    from table in doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
    

    to

    from table in doc.DocumentNode.SelectNodes("//table[@id=\"Table2\"]").Cast<HtmlNode()
    

    This will only select specific table with given Id. But if you have nested Tables then you have change your xpath accordingly to get the nested table rows.