Search code examples
c#html-parsinganglesharp

AngleSharp Parsing


Can't find many examples of using AngleSharp for parsing when you don't have a class name or id to use.

HTML

<span><a href="google.com" title="Google"><span class="icon icon_none"></span></a></span>
<span><a href="bing.com" title="Bing"><span class="icon icon_none"></span></a></span>
<span><a href="yahoo.com" title="Yahoo"><span class="icon icon_none"></span></a></span>

I want to find the href from any <a> tags that have a title = Bing

In Python BeautifulSoup I would use

item_needed = a_row.find('a', {'title': 'Bing'})

and then grab the href attribute

or jQuery

a[title='Bing']

But, I'm stuck using AngleSharp eg. following example https://github.com/AngleSharp/AngleSharp/wiki/Examples#getting-certain-elements

c# AngleSharp

var parser = new AngleSharp.Parser.Html.HtmlParser();
var document = parser.Parse(@"<span><a href=""google.com"" title=""Google""><span class=""icon icon_none""></span></a></span>< span >< a href = ""bing.com"" title = ""Bing"" >< span class=""icon icon_none""></span></a></span><span><a href = ""yahoo.com"" title=""Yahoo""><span class=""icon icon_none""></span></a></span>");

//Do something with LINQ
var blueListItemsLinq = document.All.Where(m => m.LocalName == "a" && //stuck);

Solution

  • Looks like there was problem in your HTML markup that cause AngleSharp failed to find the target element i.e the spaces around angle-brackets :

    < span >< a href = ""bing.com"" title = ""Bing"" >< span class=""icon icon_none"">
    

    Having the HTML fixed, both LINQ and CSS selector successfully select the target link :

    var parser = new AngleSharp.Parser.Html.HtmlParser();
    var document = parser.ParseDocument(@"<span><a href=""google.com"" title=""Google""><span class=""icon icon_none""></span></a></span><span><a href = ""bing.com"" title = ""Bing""><span class=""icon icon_none""></span></a></span><span><a href = ""yahoo.com"" title=""Yahoo""><span class=""icon icon_none""></span></a></span>");
    
    //LINQ example
    var blueListItemsLinq = document.All
                                    .Where(m => m.LocalName == "a" && 
                                                m.GetAttribute("title") == "Bing"
                                           );
    
    //LINQ equivalent CSS selector example
    var blueListItemsCSS = document.QuerySelectorAll("a[title='Bing']");
    
    //print href attributes value to console
    foreach (var item in blueListItemsCSS)
    {
        Console.WriteLine(item.GetAttribute("href"));
    }