Search code examples
c#linqselecthtml-agility-pack

htmlagilitypack select nodes return null


I used this code to get the page info But now the site has changed and my application returns null error.

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);
var query = doc.DocumentNode
  .SelectNodes("//table[@class='table table-striped table-hover']/tr")
  .Select(r => {
    return new DelegationLink()
    {
        Row = r.SelectSingleNode(".//td").InnerText,
        Category = r.SelectSingleNode(".//td[2]").InnerText
    };
}).ToList();

and this is my html:

 <div role="tabpanel" class="tab-pane fade " id="tab3">
                <div class="circular-div">
    <table class="table table-striped table-hover" id="circular-table">
        <thead>
            <tr>
                <th>ردیف</th>
                <th>دسته بندی</th>
                <th>عنوان</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>1</td>
                <td>بخشنامه‌ها</td>
                <td>اطلاعیه جهاد دانشگاهی</td>
            </tr>
            <tr>
                <td>2</td>
                <td>بخشنامه‌ها</td>
...
...
...

Where do I wrong?


Solution

  • Table rows are not direct descendants of the table but they are nested into other tags and that's why your code was returning null. Also you want to skip the header and scrape only the body of the table.

    var query = doc.DocumentNode
        .SelectNodes("//table[@class='table table-striped table-hover']/tbody/tr")
        .Select(r =>
        {
            return new DelegationLink()
            {
                Row = r.InnerText,
                Category = r.SelectSingleNode(".//td[2]").InnerText
            };
        }
    ).ToList();