Search code examples
javaregexregex-lookaroundsregex-groupregex-greedy

How to get more group matches after specific string?


How it is possible to get more groups of regular expression?

I want to extract substrings of following string:

group g1 l1 l2 l3 g2 g3.l1

as groups. The output should include g1, l1, l2, l3, g2 and g3.l1.

I already tried to get these with regular expressions like this:

group (\S+)\s(\S+)*

My problem is, that I could get different groups with the group expression ( ), but I can get more or less of these substrings. My string also could like this: group g1 g2.l1


Solution

  • Your pattern starts with matching group and then uses 2 capturing groups. You get only 2 groups because the repetition in the last group repeats matching only a non whitespace char \S and will not match a whitespace char.

    If you would change that to (\s\S+)* you will repeat the capturing group capturing only the value of the last repetition.

    What you might do is making use of \G to get repetetive matches by asserting the position at the end of the previous match

    (?:^group |\G(?!^))(\S+)(?:\s+|$)
    

    In java

    String regex = "(?:^group |\\G(?!^))(\\S+)(?:\\s+|$)";
    

    That will match

    • (?: Non capturing group
      • ^group Match group and a space at the string of the string
      • | Or
      • \G(?!^) Assert position at the end of the previous match, not at the start of the string
    • ) Close non capturing group
    • (\S+) Capture in group 1 matching 1+ non whitespace chars
    • (?:\s+|$) Match either 1+ whitespace chars or assert end of string

    Regex demo | Java demo

    For example

    String regex = "(?:^group |\\G(?!^))(\\S+)(?:\\s+|$)";
    String string = "group g1 l1 l2 l3 g2 g3.l1";
    
    Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
    Matcher matcher = pattern.matcher(string);
    
    while (matcher.find()) {
        System.out.println(matcher.group(1));
    }
    

    Result

    g1
    l1
    l2
    l3
    g2
    g3.l1