Search code examples
c#.netweb-scrapinghtml-agility-pack

How to remove a tag link a href without removing the link text in Html Agility Pack?


I have to replace the tag with HAP - HTML Agility Pack, in order to get a link without removing the link text. For e.g. in this case:

<p>This is <a href="mylink">the link</a></p>

I want to replace the link and the desired result should be:

<p>This is <span>the link<span></p>

Solution

  • I made this function, getting a html string as input.

    public string CleanLinks(string input) {
                HtmlDocument doc = new HtmlDocument();
                doc.LoadHtml(input);
                var links = doc.DocumentNode.SelectNodes("//a");
                if (links == null) return input;
                foreach (HtmlNode tb in links)
                {
                    HtmlNode lbl = doc.CreateElement("span");
                    lbl.InnerHtml = tb.InnerHtml;
    
                    tb.ParentNode.ReplaceChild(lbl, tb);
                }
    
                return doc.DocumentNode.OuterHtml;
            }