Search code examples
c#web-scrapinghtml-agility-packscrapysharp

Exacting a table using scrapysharp


I have data from a website that I am attempting to scrape. The data looks like below. How do I extract the table using scrapysharp?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using HtmlAgilityPack;
using ScrapySharp.Extensions;
using ScrapySharp.Network;

namespace Scrape
{
    class Program
    {
        static void Main(string[] args)
        {
            ScrapingBrowser browser = new ScrapingBrowser();

            //set UseDefaultCookiesParser as false if a website returns invalid cookies format
            //browser.UseDefaultCookiesParser = false;

            WebPage homePage = browser.NavigateToPage(new Uri("http://www.nasdaq.com/earnings/earnings-calendar.aspx"));
            var divs = homePage.Html.CssSelect("div");  //all div elements
            var trs = homePage.Html.SelectNodes("//div")
                .Where(n => !String.IsNullOrEmpty(n.GetAttributeValue("class"))
                //(n.GetAttributeValue("class") == "genTable")
                );                       
        }
    }
}

This is the relevant part of the html:

 <div class="clearB"></div>

        <div class="genTable">
            <div id="_confirmed" >
                <!--<div class="floatL">
                    <h3>Earnings Date - Confirmed by Zacks</h3>
                </div>
                <div class="clearB"></div>
                <br />-->

                <div id="two_column_main_content_pnlInsider">


                            <table class="USMN_EarningsCalendar" id="ECCompaniesTable" border="0"cellpadding="0" cellspacing="0">
                                <thead>
                                <tr>
                                    <th>
                                        <a href="javascript:void(0);" onclick="getdata('earningtype',1)">Time</a>
                                    </th>
                                    <th>
                                        Company Name (Symbol) <br /> Market Cap<br />
                                        Sort by: <a href="javascript:void(0);" onclick="getdata('name',1)">Name</a> / <a href="javascript:void(0);" onclick="getdata('marketcap',1)">Size</a>
                                    </th>
                                    <th>
                                        Expected Report Date
                                    </th>
                                    <th>
                                        Fiscal<br />
                                        Quarter<br />
                                        Ending
                                    </th>
                                    <th>
                                        <span id="two_column_main_content_CompanyTable_EPS">
                                            Consensus<br />
                                            EPS* Forecast
                                        </span>
                                    </th>
                                    <th>
                                        # of Ests
                                    </th>
                                    <th style="">
                                        <span id="two_column_main_content_CompanyTable_previousreportdate">
                                            Last Year's<br />
                                            Report Date
                                        </span>
                                    </th>
                                    <th>
                                        Last Year's EPS*
                                    </th>
                                    <th style="display:none">
                                        % Suprise<br />
                                    </th>
                                    </tr>
                                </thead>

                            <tr>
                                <td>
                                    <a href="http://www.nasdaq.com/symbol/abb/premarket" title="Pre-market Quotes"><img src="http://www.nasdaq.com/images/weather_sun.jpg" alt="Pre-Market Quotes" height="16" width="16"></a> 
                                </td>
                                <td>
                                    <a id="two_column_main_content_CompanyTable_companyname_0" href="http://www.nasdaq.com/earnings/report/abb">ABB Ltd (ABB) <br/><b>Market Cap: $47.63B</b></a> 
                                </td>
                                <td>
                                    04/20/2017
                                </td>
                                <td>
                                    Mar 2017
                                </td>
                                <td>
                                    $0.25
                                </td>
                                <td>
                                    2
                                </td>
                                <td style="">
                                    04/20/2016
                                </td>
                                <td>
                                    $0.23
                                </td>
                                <td style="display:none">
                                    <span style='color:green'>Met</span>
                                </td>
                            </tr>

                            <tr>
                                <td>
                                    <a href="http://www.nasdaq.com/symbol/acu/premarket" title="Pre-market Quotes"><img src="http://www.nasdaq.com/images/weather_sun.jpg" alt="Pre-Market Quotes" height="16" width="16"></a> 
                                </td>
                                <td>
                                    <a id="two_column_main_content_CompanyTable_companyname_1" href="http://www.nasdaq.com/earnings/report/acu">Acme United Corporation. (ACU) <br/><b>Market Cap: $92.5M</b></a> 
                                </td>
                                <td>
                                    04/20/2017
                                </td>
                                <td>
                                    Mar 2017
                                </td>
                                <td>
                                    $0.18
                                </td>
                                <td>
                                    1
                                </td>
                                <td style="">
                                    04/22/2016
                                </td>
                                <td>
                                    $0.16
                                </td>
                                <td style="display:none">
                                    <span style='color:green'>Met</span>
                                </td>
                            </tr>

             ................

Solution

  • I would imagine something like this code

    var hw = new HtmlWeb();
            doc = hw.Load("http://www.nasdaq.com/earnings/earnings-calendar.aspx");
    
            foreach (HtmlNode row in doc.DocumentNode.Descendants("table").FirstOrDefault(_ => _.Id.Equals("ECCompaniesTable")).Descendants("tr"))
            {
                Console.WriteLine(row.InnerText);
            }