I have data from a website that I am attempting to scrape. The data looks like below. How do I extract the table
using scrapysharp?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using HtmlAgilityPack;
using ScrapySharp.Extensions;
using ScrapySharp.Network;
namespace Scrape
{
class Program
{
static void Main(string[] args)
{
ScrapingBrowser browser = new ScrapingBrowser();
//set UseDefaultCookiesParser as false if a website returns invalid cookies format
//browser.UseDefaultCookiesParser = false;
WebPage homePage = browser.NavigateToPage(new Uri("http://www.nasdaq.com/earnings/earnings-calendar.aspx"));
var divs = homePage.Html.CssSelect("div"); //all div elements
var trs = homePage.Html.SelectNodes("//div")
.Where(n => !String.IsNullOrEmpty(n.GetAttributeValue("class"))
//(n.GetAttributeValue("class") == "genTable")
);
}
}
}
This is the relevant part of the html
:
<div class="clearB"></div>
<div class="genTable">
<div id="_confirmed" >
<!--<div class="floatL">
<h3>Earnings Date - Confirmed by Zacks</h3>
</div>
<div class="clearB"></div>
<br />-->
<div id="two_column_main_content_pnlInsider">
<table class="USMN_EarningsCalendar" id="ECCompaniesTable" border="0"cellpadding="0" cellspacing="0">
<thead>
<tr>
<th>
<a href="javascript:void(0);" onclick="getdata('earningtype',1)">Time</a>
</th>
<th>
Company Name (Symbol) <br /> Market Cap<br />
Sort by: <a href="javascript:void(0);" onclick="getdata('name',1)">Name</a> / <a href="javascript:void(0);" onclick="getdata('marketcap',1)">Size</a>
</th>
<th>
Expected Report Date
</th>
<th>
Fiscal<br />
Quarter<br />
Ending
</th>
<th>
<span id="two_column_main_content_CompanyTable_EPS">
Consensus<br />
EPS* Forecast
</span>
</th>
<th>
# of Ests
</th>
<th style="">
<span id="two_column_main_content_CompanyTable_previousreportdate">
Last Year's<br />
Report Date
</span>
</th>
<th>
Last Year's EPS*
</th>
<th style="display:none">
% Suprise<br />
</th>
</tr>
</thead>
<tr>
<td>
<a href="http://www.nasdaq.com/symbol/abb/premarket" title="Pre-market Quotes"><img src="http://www.nasdaq.com/images/weather_sun.jpg" alt="Pre-Market Quotes" height="16" width="16"></a>
</td>
<td>
<a id="two_column_main_content_CompanyTable_companyname_0" href="http://www.nasdaq.com/earnings/report/abb">ABB Ltd (ABB) <br/><b>Market Cap: $47.63B</b></a>
</td>
<td>
04/20/2017
</td>
<td>
Mar 2017
</td>
<td>
$0.25
</td>
<td>
2
</td>
<td style="">
04/20/2016
</td>
<td>
$0.23
</td>
<td style="display:none">
<span style='color:green'>Met</span>
</td>
</tr>
<tr>
<td>
<a href="http://www.nasdaq.com/symbol/acu/premarket" title="Pre-market Quotes"><img src="http://www.nasdaq.com/images/weather_sun.jpg" alt="Pre-Market Quotes" height="16" width="16"></a>
</td>
<td>
<a id="two_column_main_content_CompanyTable_companyname_1" href="http://www.nasdaq.com/earnings/report/acu">Acme United Corporation. (ACU) <br/><b>Market Cap: $92.5M</b></a>
</td>
<td>
04/20/2017
</td>
<td>
Mar 2017
</td>
<td>
$0.18
</td>
<td>
1
</td>
<td style="">
04/22/2016
</td>
<td>
$0.16
</td>
<td style="display:none">
<span style='color:green'>Met</span>
</td>
</tr>
................
I would imagine something like this code
var hw = new HtmlWeb();
doc = hw.Load("http://www.nasdaq.com/earnings/earnings-calendar.aspx");
foreach (HtmlNode row in doc.DocumentNode.Descendants("table").FirstOrDefault(_ => _.Id.Equals("ECCompaniesTable")).Descendants("tr"))
{
Console.WriteLine(row.InnerText);
}