Search code examples
javaregexintellij-idearegex-group

Why does my RegEx gives me only one element instead of the complete group in Java?


I am new here and this is my first post. I'am also new to Java and RegEx. So...

I am working in Java and writing a RegEx to match phone numbers in a chat with this format: +XXX (XXX) XXX XXXX Written in numbers and/or in words.

This is my regex:

"\\+?(zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d){1,3}?[._\\-]? \\(?(zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d){3}\\)?[._\\-]? (zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d){3}[._\\-]?(zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d){4}";

Then using the method .group() to print the complete match and it works fine but if I want to print every different group with this loop:

for (int i = 1; i <= match.groupCount(); i++) {
    if (match.group(i) != null) {
        System.out.println("Group " + i + ": " + match.group(i));
    }
}

then it prints ONLY JUST ONE DIGIT (OR WORD) of every group.

Why is that? And what do I have to do to print every element of every group?

I've also tried this regex

"\\+?(zero{1,3}|uno{1,3}|due{1,3}|tre{1,3}|quattro{1,3}|cinque{1,3}|sei{1,3}|sette{1,3}|otto{1,3}|nove{1,3}|\\d{1,3})?[._\\-]?\\(?(zero{3}|uno{3}|due{3}|tre{3}|quattro{3}|cinque{3}|sei{3}|sette{3}|otto{3}|nove{3}|\\d{3})\\)?[._\\-]?(zero{3}|uno{3}|due{3}|tre{3}|quattro{3}|cinque{3}|sei{3}|sette{3}|otto{3}|nove{3}|\\d{3})[._\\-]?(zero{4}|uno{4}|due{4}|tre{4}|quattro{4}|cinque{4}|sei{4}|sette{4}|otto{4}|nove{4}|\\d{4})";

...and also this one:

"\\+?(zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d{1,3})?[._\\-]?\\(?(zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d{3})\\)?[._\\-]?(zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d{3})[._\\-]?(zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d{4})";

Thanks in advance! :)


Solution

  • You need to make your groups non-capturing (?: ... ) and then wrap them into capturing ones. Something like:

    "\\+?((?:zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d){1,3}?)[._\\- ]?\\(?((?:zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d){3})\\)?[._\\- ]?((?:zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d){3})[._\\- ]?((?:zero|uno|due|tre|quattro|cinque|sei|sette|otto|nove|\\d){4})";
    

    Reasoning: if group matched multiple times only last occurrence is captured.

    Also, I've put spaces into symbol classes of delimiters.