I have a strange behavior on a really simple problem.
I have a string with a lot of null strings:
"a;b;c;null;null;null;null;null;null;null;null;null;null"
Which I remove using this method:
public String replaceAllNull(String s) {
s = s.replaceAll(";null;", ";;");
//if first item = null remove it
if(s.startsWith("null;")) {
s = s.substring(4,s.length());
}
//if last item = null remove it
if(s.endsWith(";null")) {
s = s.substring(0,s.length()-4);
}
return s;
}
It was working fine until my string became bigger and I saw this strange output
"a;b;c;;null;;null;;null;;null;;"
It's only removing one occurrence out of two.
I think I can understand that during one replace program skips one ";" then the second null is not recognized by my regex ";null;". But I don't get why is this happening?
After one instance of ";null;"
is replaced by ";;"
, then both of the semicolons are already processed, so that the second ;
cannot be considered as the start of another replacement for the next ";null;"
occurrence. The pattern cannot be matched again until after another "null" has been passed up, to reach the next semicolon.
What you can use is a pattern that doesn't attempt to match the semicolons, but it must check to see if they are there. You can use a positive lookbehind and a positive lookahead (find "lookahead" and "lookbehind" on the linked page). Here, positive means that it verifies that the pattern of the lookbehind/lookahead exists, but doesn't match it.
The positive lookbehind is of the format (?<=X)
, where X
is the pattern to look behind the main pattern to see if it exists. Also, the positive lookahead is of the format (?=X)
, where X
is the pattern to look ahead of the main pattern to see if it exists.
Here, we look for the beginning of the line ^
or a semicolon before the match, and the end of the line $
or a semicolon after the match. Then we simply replace the actual match, "null"
, with an empty string.
s = s.replaceAll("(?<=^|;)null(?=$|;)", "");