Search code examples
javaregexvalidationdomain-name

I am trying to find a regular expression for validating the domain name to be used in java


I am trying to use the following regular expression for validating the domain name and TLD (currently not validating the IDN domain names) but everytime I validate, I get false result.

/**
 * This regular expression allows multiple levels of subdomains followed by a domain and then the tld 
 * This regex will not allow www.||http://||https:// at the beginning of the string
 * Only uppercase/lowercase alphabets, numbers and hyphen are allowed for subdomain or domain and uppercase/lowercase alphabets, numbers for tld. The tld cannot begin with the number 
 * The length of the domain or subdomain can be between 1 and 63 characters and for tld it can be 2 to 63 characters
 * Total length of domain name should not exceed 256 characters
 * 
 * @param domainName - String value for domain that needs to be validated
 * @return true if domain name matches with the regex pattern else false
 */
public static boolean isValidDomainName(String domainName) {
    if (StringUtils.isNotBlank(domainName) && (domainName.trim().length() <= 256)) {
        return Pattern.compile("^(?!https?://|www\\.)[a-zA-Z0-9][a-zA-Z0-9\\-]{0,62}+\\.([a-zA-Z][a-zA-Z0-9]{1,62}){1,2})$").matcher(domainName.trim()).find();
    }
    else {
        return false;
    }
}

Here's the list of input values used -

www.google.com
https://www.google.com
.google.com
a.a
-testdomain.google.com
testdomain.a.a
5ubd0m41n.T35t-d0m41n.testtopleveldomain
google.com
subd0m41n.T35t-d0m41n.testtopleveldomain

Any help would be highly appreciated?


Solution

  • I tried to use your regex, but it appears to have an extra closing parentheses (or lacking an opening one).

    ^(?!(?:https?:\/\/)?www\.)[a-z0-9][a-z0-9-]{0,62}(?:\.[a-z0-9][a-z0-9-]{0,62})+$
    Slashes escaped:^(?!(?:https?:\\/\\/)?www\\.)[a-z0-9][a-z0-9-]{0,62}(?:\\.[a-z0-9][a-z0-9-]{0,62})+$
    

    Explanation:

     ^                           # Anchors to the beginning to the string.
     (?!                         # Opens NLA
         (?:                     # Opens NCG
             https?              # Literal https
             :                   # Literal :
             \/                  # Literal /
             \/                  # Literal /
         )?                      # Closes NCG
         www                     # Literal www
         \.                      # Literal .
     )                           # Closes NLA
     [a-z0-9]                    # Character class (any of the characters within)
     [a-z0-9-]{0,62}             # Character class (any of the characters within)
     (?:                         # Opens NCG
         \.                      # Literal .
         [a-z0-9]                # Character class (any of the characters within)
         [a-z0-9-]{0,62}         # Character class (any of the characters within)
     )+                          # Closes NCG
     $                           # Anchors to the end to the string.