Search code examples
javastringreplaceall

Java replace all replacing one of every two occurrences


I have a strange behavior on a really simple problem.

I have a string with a lot of null strings:

"a;b;c;null;null;null;null;null;null;null;null;null;null"

Which I remove using this method:

public String replaceAllNull(String s) {
    s = s.replaceAll(";null;", ";;");

    //if first item = null remove it
    if(s.startsWith("null;")) {
        s = s.substring(4,s.length());
    }

    //if last item = null remove it
    if(s.endsWith(";null")) {
        s = s.substring(0,s.length()-4);
    }
    return s;
}

It was working fine until my string became bigger and I saw this strange output

"a;b;c;;null;;null;;null;;null;;"

It's only removing one occurrence out of two.

I think I can understand that during one replace program skips one ";" then the second null is not recognized by my regex ";null;". But I don't get why is this happening?


Solution

  • After one instance of ";null;" is replaced by ";;", then both of the semicolons are already processed, so that the second ; cannot be considered as the start of another replacement for the next ";null;" occurrence. The pattern cannot be matched again until after another "null" has been passed up, to reach the next semicolon.

    What you can use is a pattern that doesn't attempt to match the semicolons, but it must check to see if they are there. You can use a positive lookbehind and a positive lookahead (find "lookahead" and "lookbehind" on the linked page). Here, positive means that it verifies that the pattern of the lookbehind/lookahead exists, but doesn't match it.

    The positive lookbehind is of the format (?<=X), where X is the pattern to look behind the main pattern to see if it exists. Also, the positive lookahead is of the format (?=X), where X is the pattern to look ahead of the main pattern to see if it exists.

    Here, we look for the beginning of the line ^ or a semicolon before the match, and the end of the line $ or a semicolon after the match. Then we simply replace the actual match, "null", with an empty string.

    s = s.replaceAll("(?<=^|;)null(?=$|;)", "");