Search code examples
javastringsplitquotes

Inconsistent behaviour of StrTokenizer to split string


I'm trying to split a string at a given delimiter allowing for delimiters to be inside quotes to be ignored. E.g.

"foo; bar; 'foo; bar'"

Should be slitted into 3 strings given delimiter ';' and quote char ':

foo bar foo; bar

I'm using StrTokenizer as below but it doesn't seem to work for "foo; bar; 'foo; bar'" but it does work for "'foo; bar'; foo; bar;"

Can anyone explain what is wrong?

import org.apache.commons.lang3.text.StrTokenizer;
public class Main { 
    public static void main(String[] args) {

        String x= "foo; bar; 'foo; bar'";

        StrTokenizer tokens= new StrTokenizer(x, ';', '\'');

        for (String token : tokens.getTokenArray()) {
            System.out.println(token.trim());
        }
        // Prints:
        // foo
        // bar
        // 'foo
        // bar'

        /* --------- */
        // THIS IS OK:
        x= "'foo; bar'; foo; bar";

        tokens= new StrTokenizer(x, ';', '\'');

        for (String token : tokens.getTokenArray()) {
            System.out.println(token.trim());
        }
        // Prints:
        // foo; bar
        // foo
        // bar
    }
}

Solution

  • It looks like by default quoted area can't be preceded by any character (even space) except delimiter (so ; 'quote' is not OK, but ;'qupte' is fine) - (which is little strange because space between end of quote and delimiter doesn't seem to cause any problem, which may suggest that this may be a bug).

    Explicitly setting characters which should be trimmed seems to solve this problem (also you no longer need to add trim() in your printing statements):

    StrTokenizer tokens = new StrTokenizer(x, ';', '\'');
    tokens.setTrimmerMatcher(StrMatcher.spaceMatcher());// <- add this line
    

    To trim on: space, tab, newline and formfeed use StrMatcher.splitMatcher()