Search code examples
regexdomain-namerfc1035

Check Domain Names According to RFC 1035 Standard in Java


I am trying to write code to check domain names are valid according to rfc 1035 standard or not. RFC 1035(https://www.rfc-editor.org/rfc/rfc1035) standard has following criteria for domain names:

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9

Note that while upper and lower case letters are allowed in domain
names, no significance is attached to the case.  That is, two names with
the same spelling but different case are to be treated as if identical.

The labels must follow the rules for ARPANET host names.  They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen.  There are also some
restrictions on the length.  Labels must be 63 characters or less.

I have written following code snippet in Java to check if domain name is valid according to rfc 1035 or not.

//DomainUtils.java
import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class DomainUtils {

   private static Pattern pDomainNameOnly1;
   private static Pattern pDomainNameOnly2;

   private static final String DOMAIN_NAME_PATTERN_CHK_1 = "^(?![0-9-])[A-Za-z0-9-]{1,63}(?<!-)$";
   private static final String DOMAIN_NAME_PATTERN_CHK_2 = "^((?![0-9-])[A-Za-z0-9-]{1,63}(?<!-)\\.)+(?![0-9-])[A-Za-z0-9-]{1,63}(?<!-)$";

   static {
       pDomainNameOnly1 = Pattern.compile(DOMAIN_NAME_PATTERN_CHK_1);
       pDomainNameOnly2 = Pattern.compile(DOMAIN_NAME_PATTERN_CHK_2);
   }

   public static boolean isValidDomainName(String domainName) {
       return (pDomainNameOnly1.matcher(domainName).find() || pDomainNameOnly2.matcher(domainName).find() || domainName.equals(" "));
   }

}

and

//Main.java
public class Main{
   public static void main(String[] args){
       boolean valid = DomainUtils.isValidDomainName("a123456789a123456789a123456789a123456789a123456789a1234567891234.ARPA"); //check if domain name is valid or not
       System.out.println("Valid domain name : " + valid);
   }

}

I just wanted to check if there is some efficient way(other than what i have written) to check if domain name is valid with rfc 1035 standard? Also if I need to check my code works for corner cases for rfc 1035 standard, then where can I check. Are there some existing libraries I can use for this check?


Solution

  • Try this:

    ^[a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?)*$
    

    as can be shown in this demo

    To construct this expression, we first use the label component (a single char in the set a-zA-Z followed (optionally) by a sequence of chars in the set a-zA-Z0-9-, and ending in a non - (hyphen is permitted inside, but not at the beginning or end of a label) leading to

    [a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?
    

    this expression is repeated under the following pattern:

    A(\.A)*
    

    which means a sequence of A, followed by any number (even 0) of sequences of a dot followed by another instance of A.

    By substituting the above reges in the positions of A, we get to the final regexp. The anchors eliminate any other surrounding strings in the beginning/end of the string.

    To check that labels be only up to 63 chars, you can do

    [a-zA-Z]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?
    

    but beware as this regexp compiles to a very big table automaton (an automaton with many states) so you had better to relax if you are short of space.