How can I get the content for a group with an asterisk?
For example I'd like to pare a comma separated list, e. g. 1,2,3,4,5
.
private static final String LIST_REGEX = "^(\\d+)(,\\d+)*$";
private static final Pattern LIST_PATTERN = Pattern.compile(LIST_REGEX);
public static void main(String[] args) {
final String list = "1,2,3,4,5";
final Matcher matcher = LIST_PATTERN.matcher(list);
System.out.println(matcher.matches());
for (int i = 0, n = matcher.groupCount(); i < n; i++) {
System.out.println(i + "\t" + matcher.group(i));
}
}
And the output is
true
0 1,2,3,4,5
1 1
How can I get every single entry, i. e. 1
, 2
, 3
, ...?
I am searching for a common solution. This is only a demonstrative example.
Please imagine a more complicated regex like ^\\[(\\d+)(,\\d+)*\\]$
to match a list like [1,2,3,4,5]
You can use String.split()
.
for (String segment : "1,2,3,4,5".split(","))
System.out.println(segment);
Or you can repeatedly capture with assertion:
Pattern pattern = Pattern.compile("(\\d),?");
for (Matcher m = pattern.matcher("1,2,3,4,5");; m.find())
m.group(1);
For your second example you added you can do a similar match.
for (String segment : "!!!!![1,2,3,4,5] //"
.replaceFirst("^\\D*(\\d(?:,\\d+)*)\\D*$", "$1")
.split(","))
System.out.println(segment);
I made an online code demo. I hope this is what you wanted.
how can I get all the matches (zero, one or more) for a arbitary group with an asterisk
(xyz)*
? [The group is repeated and I would like to get every repeated capture.]
No, you cannot. Regex Capture Groups and Back-References tells why:
The Returned Value for a Given Group is the Last One Captured
Since a capture group with a quantifier holds on to its number, what value does the engine return when you inspect the group? All engines return the last value captured. For instance, if you match the string
A_B_C_D_
with([A-Z]_)+
, when you inspect the match, Group 1 will beD_
. With the exception of the .NET engine, all intermediate values are lost. In essence, Group 1 gets overwritten each time its pattern is matched.