Search code examples
javaregexexpressiongroupingcapture

In Java with regular expressions, how to capture numbers from a string with unknown length?


My regular expression looks like this: "[a-zA-Z]+[ \t]*(?:,[ \t]*(\\d+)[ \t]*)*"

I can match the lines with this, but I don't know how to capture the numbers,I think it has to do something with grouping.

For example: from the string "asd , 5 ,2,6 ,8", how to capture the numbers 5 2 6 and 8?

A few more examples:

sdfs6df -> no capture

fdg4dfg, 5 -> capture 5

fhhh3      ,     6,8    , 7 -> capture 6 8 and 7

asdasd1,4,2,7 -> capture 4 2 and 7

So I can continue my work with these numbers. Thanks in advance.


Solution

  • You could match the leading word characters and make use of the \G anchor capturing the continuous digits after the comma.

    Pattern

    (?:\w+|\G(?!^))\h*,\h*([0-9]+)
    

    Explanation

    • (?: Non capture group
    • \w+ Match 1+ word chars -| or
      • \G(?!^) Assert postition at the end of previous match, not at the start
    • ) Close non capturing group
    • \h*,\h* Match a comma between horizontal whitespace chars
    • ([0-9]+) Capture group 1, match 1+ digits

    Regex demo | Java demo

    In Java with double escaped backslashes:

    String regex = "(?:\\w+|\\G(?!^))\\h*,\\h*([0-9]+)";
    

    Example code

    String regex = "(?:\\w+|\\G(?!^))\\h*,\\h*([0-9]+)";
    String string = "sdfs6df -> no capture\n\n"
         + "fdg4dfg, 5 -> capture 5\n\n"
         + "fhhh3      ,     6,8    , 7 -> capture 6 8 and 7\n\n"
         + "asdasd1,4,2,7 -> capture 4 2 and 7";
    
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(string);
    
    while (matcher.find()) {
        System.out.println(matcher.group(1));
    }
    

    Output

    5
    6
    8
    7
    4
    2
    7