Search code examples
javaregexhref

Replace URL in text with href (in Java)


I have to tranform url's entered in plain text into html hrefs and I want to find multiple urls.

This: Hi here is a link for you: http://www.google.com. Hope it works.

Will become: Hi here is a link for you: <a href='http://www.google.com'>http://www.google.com</a>. Hope it works.

Found this code:

public String transformURLIntoLinks(String text){
String urlValidationRegex = "(https?|ftp)://(www\\d?|[a-zA-Z0-9]+)?.[a-zA-Z0-9-]+(\\:|.)([a-zA-Z0-9.]+|(\\d+)?)([/?:].*)?";
Pattern p = Pattern.compile(urlValidationRegex);
Matcher m = p.matcher(text);
StringBuffer sb = new StringBuffer();
while(m.find()){
    String found =m.group(0); 
    m.appendReplacement(sb, "<a href='"+found+"'>"+found+"</a>"); 
}
m.appendTail(sb);
return sb.toString();
}

Posted here https://stackoverflow.com/a/17704902

And it works perfectly. For all urls properly prefixed with http. But I also want to find url's starting with just www.

Can anyone that knows his regex help me out?


Solution

  • Make the (https?|ftp):// part optional. This is done by adding a question mark ?. So it will be ((https?|ftp)://)?

    Use this RegEx:

    \b((https?|ftp):\/\/)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[A-Za-z]{2,6}\b(\/[-a-zA-Z0-9@:%_\+.~#?&//=]*)*(?:\/|\b)
    

    Escape Java escape character (\):

    \\b((https?|ftp):\\/\\/)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[A-Za-z]{2,6}\\b(\\/[-a-zA-Z0-9@:%_\\+.~#?&//=]*)*(?:\\/|\\b)
    

    Examples

    Example 1 (with protocol, in sentence)

    Example 1

    Example 2 (without protocol, in sentence)

    Example 2