Search code examples
c#htmlanglesharp

How to find and replace href values on links using AngleSharp?


I have a snippet of some HTML that contains some links with hrefs that start with a hashtag like the following

<a href="#Getting Started">Getting Started</a>

I'm new to AngleSharp and am trying to use it to find these links and replace the hrefs to new values and then return the updated HTML markup back.


Solution

  • The beauty of AngleSharp is that you can essentially fall back to any JS solution - as AngleSharp exposes the W3C DOM API (which is also used by JS). All you'd need to do is replace certain camelCase with PascalCase and use standard .NET tools instead of things from JS.

    Let's take for instance How to Change All Links with javascript (sorry, was the first hit on my Google search) and use this as a starting point.

    var context = BrowsingContext.New(Configuration.Default);
    var document = await context.OpenAsync(res => res.Content(""));
    var anchors = document.GetElementsByTagName("a");
    
    for (var i = 0; i < anchors.Length; i++)
    {
        var anchor = anchors[i] as IHtmlAnchorElement;
        anchor.Href = "http://example.com/?redirect=" + anchor.Href;
    }
    

    So in our case we are not interested in the same transformation, but quite a similar one. We could do:

    for (var i = 0; i < anchors.Length; i++)
    {
        var anchor = anchors[i] as IHtmlAnchorElement;
    
        if (anchor.GetAttribute("href")?.StartsWith("#") ?? false)
        {
            anchor.Href = "your-new-value";
        }
    }
    

    Reason is that Href is always normalized (i.e., a full URL) such that an attribute value of "#foo" may be look like "http://example.com/path#foo". By looking at the raw value we can just assume that the value still starts with the hash symbol.