I'm trying to create a regex that finds raw e-mail addresses from a long body text that hasn't been linked yet in HTML. For instance,
<a href="mailto:[email protected]">[email protected]</a>
should return false[email protected]
should return trueand I would like to replace them with properly linked e-mail addresses.
I've tried:
html = html.replaceAll(
/(?:(?!href=['"]mailto:)(?!<a.*?>))([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)(?:(?!<\/a>))/gi,
"<a href=\'mailto:$1\'>$1</a>"
);
The idea is that to find e-mail addresses that do not have href='mailto:
or the <a>
tag before the e-mail, and do not have </a>
after the e-mail. However, it appears that the negative lookahead ?!
is not giving me the intended result:
let regex = new RegExp(/(?:(?!href=['"]mailto:)(?!<a.*?>))([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)(?:(?!<\/a>))/, 'gi');
console.log(regex.test("href='mailto:[email protected]"))
As you can see from the above snippet, despite adding the negative lookahead, testing href='mailto:[email protected]
against the regex is returning true.
I also tried:
(.*)^(?:(?!href=['"]mailto:)(?!<a.*?>))([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)(?:(?!<\/a>))
it doesn't match e-mails with the href='mailto:
prefix but now
regex.test("1: [email protected]") // returns false
The e-mail addresses can be inline so I can't use the ^
operator at the beginning.
Any ideas on how I can achieve this? Thanks in advance.
One approach would be to do a regex replacement with a callback function on the following pattern:
<a href="mailto:\S+">.*?<\/a>|\S+@\S+\.\S+
This uses an alternation trick to try to first find anchor tags which already have email addresses in them. That failing, the alternation falls back to finding email addresses anywhere else in the text.
var input = "Here is a tag with email <a href=\"mailto:[email protected]\">[email protected]</a> and here is just the email [email protected]";
console.log("INPUT: " + input);
var output = input.replace(/<a href="mailto:\S+">.*?<\/a>|\S+@\S+\.\S+/g, function(match, contents, offset, inp)
{
if (/<a href="mailto:\S+">.*?<\/a>/.test(match)) {
return match;
}
else {
return "<a href=\"mailto:" + match + "\">" + match + "<\/a>";
}
}
);
console.log("OUTPUT: " + output);
In the code snippet above, the callback function checks if the match be an anchor tag already having an email address, in which case it just returns the same match. For all other email addresses, it wraps them in an anchor tag and then returns that replacement.