Search code examples
javaregexregex-lookaroundsregex-group

How to extract multiple values matching a pattern after a specific keyword?


Would need help on how to extract the multiple passport numbers matching after a passport keyword using a regex's

Text:

my friends passport numbers are V123456, V123457 and V123458

Regex:

(?<=passport)\s*(?:\w+\s){0,10}\s*(\b[a-zA-Z]{0,2}\d{6,12}[a-zA-Z]{0,2}\b)

Expected matches output:

V123456
V123457
V123458

Actual output:

V123456

Solution

  • You can't rely on a lookbehind here since you would need a pattern of an indefinite length. It is supported, but only in recent Java versions.

    You may use a pattern based on the \G operator:

    (?:\G(?!\A)|\bpassport\b).*?\b([a-zA-Z]{0,2}\d{6,12}[a-zA-Z]{0,2})\b
    

    See the regex demo. Pattern details:

    • (?:\G(?!\A)|\bpassport\b) - either a whole word passport (\bpassport\b) or (|) the end of the previous successful match (\G(?!\A))
    • .*? - any zero or more chars as few as possible (since the pattern is compiled with Pattern.DOTALL, the . can match any characters including line break characters)
    • \b([a-zA-Z]{0,2}\d{6,12}[a-zA-Z]{0,2})\b - a whole word that starts with zero, one or two ASCII letters, then has six to 12 digits and ends with zero, one or two ASCII letters.

    See the Java demo below:

    String s = "my friends passport numbers are V123456, V123457 and V123458";
    String rx = "(?:\\G(?!^)|\\bpassport\\b).*?\\b([a-zA-Z]{0,2}\\d{6,12}[a-zA-Z]{0,2})\\b";
    Pattern pattern = Pattern.compile(rx, Pattern.DOTALL);
    Matcher matcher = pattern.matcher(s);
    while (matcher.find()){
        System.out.println(matcher.group(1)); 
    }
    

    Output:

    V123456
    V123457
    V123458