I want to detect URLs and make them link in HTML code. I've searched Stack Overflow but many answers are about detecting and converting links in text strings. When I do that html code will be invalid; ie. img sources will change, etc.
P.S: Close voters: Please read question carefully! It's not duplicate.
For example; the line 1 needs to be converted, and lines 2 & 3 do not.
<!-- Sample html source -->
<div>
Line 1 : https://www.google.com/
Line 2 : <a href="https://www.google.com/">https://www.google.com/</a>
Line 3: <img src="http://a-domain.com/lovely-image.jpg">
</div>
I need to:
Find any URL in html body part
Check if it is clickable or not: If not wrapped by 'a', 'img', '!--', etc..
If not make it clickable: Wrap with 'a'
How can I do that? All C# and JS versions are OK to me.
LATEST UPDATE Changing project build target from 4.7.2 to 4.5 and back to 4.7.2 fixed the "bug".
UPDATE: This is my solution with help of @jira The problem here is nodes won't change at all. I mean the recursive function does the job, replaces links, debugging says, however html document won't update at all. Any modification inside the function doesn't effect outside of the function, I don't know why, InnerText changes - InnerHtml doesn't change
var htmlVersion = "<html><head></head><body>\r\n"
+ "Some text\r\n"
+ "<div>http://google.com</div>\r\n"
+ " Then later more text: http://500px.com\r\n"
+ "<div>Sub <span>abc</span> Back text</div>\r\n"
+ "And the final text"
+ "</body></html>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlVersion);
// Linkify body
var modified = false;
var bodyNode = doc.DocumentNode.SelectSingleNode("//body");
var before = bodyNode.InnerHtml;
bodyNode = Linkify(bodyNode);
modified = modified || bodyNode.InnerHtml != before;
// modified is false !!!
The recursive Linkify function:
HtmlAgilityPack.HtmlNode Linkify(HtmlAgilityPack.HtmlNode node)
{
if (node.Name == "a") // It's already a link
{
return node;
}
if (node.Name == "#text") // Do replacement here
{
// Create links
// https://stackoverflow.com/a/4750468/627193
node.InnerHtml = Regex.Replace(node.InnerHtml,
@"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?)",
"<a target='_blank' href='$1'>$1</a>");
}
for (int i = 0; i < node.ChildNodes.Count; i++) // Go for child nodes
{
node.ChildNodes[i] = Linkify(node.ChildNodes[i]);
}
return node;
}
After changing project build target from 4.7.2 to 4.5 and go back to 4.7.2 again fixed the "bug".
Here is the working code:
var htmlVersion = "<html><head></head><body>\r\n"
+ "Some text\r\n"
+ "<div>http://google.com</div>\r\n"
+ " Then later more text: http://500px.com\r\n"
+ "<div>Sub <span>abc</span> Back text</div>\r\n"
+ "And the final text"
+ "</body></html>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlVersion);
// Linkify body
var modified = false;
var bodyNode = doc.DocumentNode.SelectSingleNode("//body");
var before = bodyNode.InnerHtml;
bodyNode = Linkify(bodyNode);
modified = modified || bodyNode.InnerHtml != before;
The recursive Linkify function:
HtmlAgilityPack.HtmlNode Linkify(HtmlAgilityPack.HtmlNode node)
{
if (node == null || node.Name == "a") // It's already a link
{
return node;
}
if (node.Name == "#text") // Do replacement here
{
// Create links
// https://stackoverflow.com/a/4750468/627193
node.InnerHtml = Regex.Replace(node.InnerHtml,
@"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?)",
"<a target='_blank' href='$1'>$1</a>");
}
for (int i = 0; i < node.ChildNodes.Count; i++) // Go for child nodes
{
node.ChildNodes[i] = Linkify(node.ChildNodes[i]);
}
return node;
}