Search code examples
javasplitapache-stringutilsapache-commons-lang3

Difference between splitByWholeSeparator, splitPreserveAllTokens, and splitByWholeSeparatorPreserveAllTokens


In the org.apache.commons.lang3.StringUtils class, what is the difference between splitByWholeSeparator(String, String), splitPreserveAllTokens(String, String) and splitByWholeSeparatorPreserveAllTokens(String, String)? I checked the JavaDoc and it is not clear at all why I would use one method over the others.


Solution

  • After some looking over the documentation I see what may be the problem.

    1. split takes the separator string and treats every character in it as a separator char. Adjacent separator chars are seen as one, no empty array elements.
    2. splitPreserveAllTokens does the same but adjacent separator chars lead to empty array elements.
    3. splitByWholeSeparator uses the whole separator string to split the string. Adjacent separator strings are seen as one, no empty array elements.
    4. splitByWholeSeparatorPreserveAllTokens does the same but but adjacent separator strings lead to empty array elements.

    Note:
    The function always adds the remaining characters after the last separator. If the String ends with a separator it adds an empty String because there is no remaining length check at that point.

    An example:

    String: "a,b,;,;e,f,,g,h"
    Separator: ",;"
    
    split: ["a","b","e","f,"g",h"]
    splitPreserveAllTokens: ["a","b","","","","e","f","","g","h"]
    splitByWholeSeparator: ["a,b","e,f,,g,h"]
    splitByWholeSeparatorPreserveAllTokens: ["a,b","","e,f,,g,h"]