Search code examples
javadelimiterstring-parsing

Java - String Parsing or split() bug in using multiple delimiters


Ok, you might say that this is a duplicate post but it is different.

I am working on a program that is working on some kind of deleting delimiters specified by the user. My program is working if the delimiter is only a single character (special or not). However, if the user input is a string, it removes the all characters of the delimiter from the message string.

ex. String message = "ab\nc[d]e{fMardk1g(h)i}j"; output will be : bcefghij but the expected output is abcdefghij

I'm new in using the Pattern class, so I don't know where the problem lies.

Here's the code in question (I put it in a testing class so I can isolate the problem):

import java.util.regex.Pattern;

public class ParsingTest {
    public static void main(String[] args) {
        String[] delimiters = { "Mardk1", "\n", "[", "]", "{", "}", "(", ")" };  
        StringBuilder regexp = new StringBuilder("");  
        regexp.append("[");  
        for(String s : delimiters) {  
            regexp.append("[");  
            regexp.append(Pattern.quote(s));  
            regexp.append("]");  
        }  
        regexp.append("]");  

        String message = "ab\nc[d]e{fMardk1g(h)i}j";  
        StringBuilder result = new StringBuilder("");  
        String[] a = message.split(regexp.toString());  
        for(String string : a) {  
            result.append(string);
        }
        System.out.println(result);
        for(String str: a) System.out.print(str);
        System.out.println();
    }
}

Solution

  • You're using the wrong kind of grouping construct. You're building a pattern like [xyz] which will match any single character x, y or z. You want to match any of several full strings, so you want the normal () style grouping, and the alternation operator (|). Have a look at the Pattern documentation for more details.

    Try this instead to build up the regex:

    for(String s : delimiters) {
        // We don't want to start with (|
        if (regexp.length() > 1)
        {
            regexp.append("|");
        }
        regexp.append(Pattern.quote(s));  
    }