Search code examples
javaregexstringpositive-lookahead

Java regex positive look-ahead but match unique characters only?


I'm trying to match a String input with the criteria below:

  1. The first characters are unique lowercase English letters
  2. The next characters are the represent the current year from 1500 to 2020
  3. The next characters can only be 10, or 100, or 1000
  4. The last character will be a digit 0 through 9

The regex string that I have created that I believe is mostly correct is with explanation is:

String validRegex = 
"^"+                                    # start of string
(?=.*[a-z].*[a-z].*[a-z])"+             # Ensure string has only 3 consecutive lowercase English letters
"(?=.*[0-9].*[0-9].*[0-9].*[0-9])"+     # Ensure string has only 4 digits representing year i.e. 2020
"(?=.*([0-9].*[0-9]) | ([0-9].*[0-9].*[0-9]) | ([0-9].*[0-9].*[0-9].*[0-9]))"+ # Ensure 10, 100, or 100 digits
"(?=.*[0-9])"+                          # Ensure last character is a digit 0-9
"(?=\\S+$)"+                             # Ensure string has no whitespace
".{10,12}"+                              # Entire string length must be from 10 through 12 characters
"$";                                     # end of string

Is there a simple way to update my regex expression such that I can detect for only unique consecutive characters?


Solution

  • Look:

    • The entire input (String) length will be from 10 to 12 characters always - ^.{10,12}$ (HOWEVER, in this case, you do not need to add this to the overall pattern because all parts below will sum up to 10, 11 or 12 chars allowed in the string)
    • The first 3 characters are UNIQUE lowercase English letters ([a-z]) - ^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z]
    • The next 4 characters are the represent the current year from 1500 to 2020, i.e. 2020 - (?:1[5-9][0-9]{2}|20[01][0-9]|2020)
    • The next characters can only be 10, or 100, or 1000 only (so at minimum 2 chars (i.e. 10), or at max 4 chars (i.e. 1000)) - [0-9]{2,4}
    • The last character will be a digit 0 through 9 - [0-9].

    Joining these bits, you get

    String regex = "^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z](?:1[5-9][0-9]{2}|20[01][0-9]|2020)[0-9]{2,4}[0-9]$";
    

    See the regex demo.

    If you plan to support lower- and uppercase letter, add the case insensitive modifier (?i) at the start:

    String regex = "(?i)^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z](?:1[5-9][0-9]{2}|20[01][0-9]|2020)[0-9]{2,4}[0-9]$";
    

    If there can be a letter at the end, not just a digit, you may use

    String regex = "(?i)^([a-z])(?!\\1)([a-z])(?!\\1|\\2)[a-z](?:1[5-9][0-9]{2}|20[01][0-9]|2020)[0-9]{2,4}[0-9a-z]$";
    

    See this regex demo.

    To create regex number ranges, you may use such well-known services as gamon.webfactional.com or richie-bendall.ml, or MyRegexTester.com.

    See the Java demo:

    String regex = "(?i)(([a-z])(?!\\2)([a-z])(?!\\2|\\3)[a-z])(1[5-9][0-9]{2}|20[01][0-9]|2020)([0-9]{2,4})([0-9a-z])";
    String s = "AVG190420T";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(s);
    if (matcher.find()){
        System.out.println("Part 1: " + matcher.group(1));
        System.out.println("Part 2: " + matcher.group(4));
        System.out.println("Part 3: " + matcher.group(5));
        System.out.println("Part 4: " + matcher.group(6));
    } else {
        System.out.println(s + " does not match the pattern.");
    }
    

    Output:

    Part 1: AVG
    Part 2: 1904
    Part 3: 20
    Part 4: T