Search code examples
c#htmlweb-scrapinghtml-agility-packscrapysharp

How to use ScrapySharp to parse elements in an html document?


Here's the project official "Documentation":

https://bitbucket.org/rflechner/scrapysharp/wiki/Home


No matter what I try, I can't find the CssSelect() method that the library is supposed to add to make querying things easier. Here's what I've tried:

using ScrapySharp.Core;
using ScrapySharp.Html.Parsing;
using HtmlAgilityPack;

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://www.stackoverflow.com");

var page = doc.DocumentNode.SelectSingleNode("//body");
page.CssSel???

Exactly how do I use this library? In the documentation it isn't clear what type html is.


Solution

  • Add

    using ScrapySharp.Extensions;
    

    It looks like you're missing that. That should make CssSelect available.

    Just in case an example helps, here's a method, as well, that I use in a project:

    private string GetPdfUrl(HtmlDocument document, string baseUrl)
    {
        return new Uri(new Uri(baseUrl), document.DocumentNode.CssSelect(".table-of-content .head-row td.download a.text-pdf").Single().Attributes["href"].Value).ToString();
    }