Search code examples
javaregexcapturing-group

Regex capturing group doesn't recognise group(1) despite matches() true


I'm writing some simple (I thought) regex in Java to remove an asterisk or ampersand which occurs directly next to some specified punctuation.
This was my original code:

String ptr = "\\s*[\\*&]+\\s*";
String punct1 = "[,;=\\{}\\[\\]\\)]"; //need two because bracket rules different for ptr to left or right
String punct2 = "[,;=\\{}\\[\\]\\(]";

out = out.replaceAll(ptr+"("+punct1+")|("+punct2+")"+ptr,"$1");

Which instead of just removing the "ptr" part of the string, removed the punct too! (i.e. replaced the matched string with an empty string)
I examined further by doing:

String ptrStr = ".*"+ptr+"("+punct1+")"+".*|.*("+punct2+")"+ptr+".*";
Matcher m_ptrStr = Pattern.compile(ptrStr).matcher(out);

and found that:

m_ptrStr.matches() //returns true, but...
m_ptrStr.group(1) //returns null??

I have no idea what I'm doing wrong as I've used this exact method before with far more complicated regex and group(1) has always returned the captured group. There must be something I haven't been able to spot, so.. any ideas?


Solution

  • The problem is that you have an alternation with a capturing group on each side:

    (regex1)|(regex2)
    

    The matcher will start and search for a match using the first alternation; if not found, it will try the second alternation.

    However, those are still two groups, and only one will match. The one which will not match will return null, and this is what happens to you here.

    You therefore need to test both groups; since you have a match, at least one will not be null.