Search code examples
javascriptc#html-agility-pack

How to convert unclickable plain text URLs to links in HTML source


I want to detect URLs and make them link in HTML code. I've searched Stack Overflow but many answers are about detecting and converting links in text strings. When I do that html code will be invalid; ie. img sources will change, etc.

P.S: Close voters: Please read question carefully! It's not duplicate.

For example; the line 1 needs to be converted, and lines 2 & 3 do not.

<!-- Sample html source -->
<div>
   Line 1 : https://www.google.com/
   Line 2 : <a href="https://www.google.com/">https://www.google.com/</a>
   Line 3: <img src="http://a-domain.com/lovely-image.jpg">
</div>

I need to:

  1. Find any URL in html body part

  2. Check if it is clickable or not: If not wrapped by 'a', 'img', '!--', etc..

  3. If not make it clickable: Wrap with 'a'

How can I do that? All C# and JS versions are OK to me.

LATEST UPDATE Changing project build target from 4.7.2 to 4.5 and back to 4.7.2 fixed the "bug".

UPDATE: This is my solution with help of @jira The problem here is nodes won't change at all. I mean the recursive function does the job, replaces links, debugging says, however html document won't update at all. Any modification inside the function doesn't effect outside of the function, I don't know why, InnerText changes - InnerHtml doesn't change

var htmlVersion = "<html><head></head><body>\r\n"
   + "Some text\r\n"
   + "<div>http://google.com</div>\r\n"
   + " Then later more text: http://500px.com\r\n"
   + "<div>Sub <span>abc</span> Back text</div>\r\n"
   + "And the final text"
   + "</body></html>";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlVersion);

// Linkify body
var modified = false;
var bodyNode = doc.DocumentNode.SelectSingleNode("//body"); 
var before = bodyNode.InnerHtml;
bodyNode = Linkify(bodyNode);
modified = modified || bodyNode.InnerHtml != before;
// modified is false !!!

The recursive Linkify function:

HtmlAgilityPack.HtmlNode Linkify(HtmlAgilityPack.HtmlNode node)
{
    if (node.Name == "a") // It's already a link
    {
        return node;
    }

    if (node.Name == "#text") // Do replacement here
    {

        // Create links
        // https://stackoverflow.com/a/4750468/627193
        node.InnerHtml = Regex.Replace(node.InnerHtml,
            @"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)",
            "<a target='_blank' href='$1'>$1</a>");

    }

    for (int i = 0; i < node.ChildNodes.Count; i++) // Go for child nodes
    {
        node.ChildNodes[i] = Linkify(node.ChildNodes[i]);
    }
    return node;
}

Solution

  • After changing project build target from 4.7.2 to 4.5 and go back to 4.7.2 again fixed the "bug".

    Here is the working code:

    var htmlVersion = "<html><head></head><body>\r\n"
       + "Some text\r\n"
       + "<div>http://google.com</div>\r\n"
       + " Then later more text: http://500px.com\r\n"
       + "<div>Sub <span>abc</span> Back text</div>\r\n"
       + "And the final text"
       + "</body></html>";
    
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(htmlVersion);
    
    // Linkify body
    var modified = false;
    var bodyNode = doc.DocumentNode.SelectSingleNode("//body"); 
    var before = bodyNode.InnerHtml;
    bodyNode = Linkify(bodyNode);
    modified = modified || bodyNode.InnerHtml != before;
    

    The recursive Linkify function:

    HtmlAgilityPack.HtmlNode Linkify(HtmlAgilityPack.HtmlNode node)
    {
        if (node == null || node.Name == "a") // It's already a link
        {
            return node;
        }
    
        if (node.Name == "#text") // Do replacement here
        {
    
            // Create links
            // https://stackoverflow.com/a/4750468/627193
            node.InnerHtml = Regex.Replace(node.InnerHtml,
                @"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)",
                "<a target='_blank' href='$1'>$1</a>");
    
        }
    
        for (int i = 0; i < node.ChildNodes.Count; i++) // Go for child nodes
        {
            node.ChildNodes[i] = Linkify(node.ChildNodes[i]);
        }
        return node;
    }