I have a string like
String str = "美国临时申请No.62004615";
And a regex like
String regex = "(((美国|PCT|加拿大){0,1})([\\u4E00-\\u9FA5]{1,8})((NO.|NOS.){1})([\\d]{5,}))";
And other code is
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println("1:"+matcher.group(1)+"\n"
+"2:"+matcher.group(2)+"\n"
+"3:"+matcher.group(3)+"\n"
+"4:"+matcher.group(4)+"\n"
+"5:"+matcher.group(5)+"\n"
+"6:"+matcher.group(6)+"\n"
+"7:"+matcher.group(7));
}
I know Parenthesis () are used to enable grouping of regex phrases. And group 1 is the big group.
The second group is ((美国|PCT|加拿大){0,1}) to match the "美国" or "PCT" or "加拿大". The third group is ([\u4E00-\u9FA5]{1,8}) to match the chinese character which length is one to eight. The fouth group is ((NO.|NOS.){1}) to match the NO. or NOS. The fifth group is ([\d]{5,}) to match the number But the console is1:美国临时申请No.62004615 2:美国 3:美国 4:临时申请 5:No. 6:No. 7:62004615
The group (2) is the same as group (3).The group (5) is the same as group (6)
It seems that group (3) rematches the sub-parentheses inside the parentheses again. I wonder if there is a way to match only the outermost parentheses。 The ideal result should be1:美国临时申请No.62004615 2:美国 3:临时申请 4:No. 5:62004615
It sounds like you want a non-capturing group. From the Pattern documentation:
(?:
X)
X, as a non-capturing group
So, change this:
(美国|PCT|加拿大)
to this:
(?:美国|PCT|加拿大)
… and then it will not be represented as a group at all in the Matcher.
Some side notes:
{0,1}
is the same as writing ?
.{1}
does nothing and can be removed entirely.[\\d]
is the same as just \\d
.