I'm trying to split a string at a given delimiter allowing for delimiters to be inside quotes to be ignored. E.g.
"foo; bar; 'foo; bar'"
Should be slitted into 3 strings given delimiter ';' and quote char ':
foo
bar
foo; bar
I'm using StrTokenizer as below but it doesn't seem to work for "foo; bar; 'foo; bar'"
but it does work for "'foo; bar'; foo; bar;"
Can anyone explain what is wrong?
import org.apache.commons.lang3.text.StrTokenizer;
public class Main {
public static void main(String[] args) {
String x= "foo; bar; 'foo; bar'";
StrTokenizer tokens= new StrTokenizer(x, ';', '\'');
for (String token : tokens.getTokenArray()) {
System.out.println(token.trim());
}
// Prints:
// foo
// bar
// 'foo
// bar'
/* --------- */
// THIS IS OK:
x= "'foo; bar'; foo; bar";
tokens= new StrTokenizer(x, ';', '\'');
for (String token : tokens.getTokenArray()) {
System.out.println(token.trim());
}
// Prints:
// foo; bar
// foo
// bar
}
}
It looks like by default quoted area can't be preceded by any character (even space) except delimiter (so ; 'quote'
is not OK, but ;'qupte'
is fine) - (which is little strange because space between end of quote and delimiter doesn't seem to cause any problem, which may suggest that this may be a bug).
Explicitly setting characters which should be trimmed seems to solve this problem (also you no longer need to add trim()
in your printing statements):
StrTokenizer tokens = new StrTokenizer(x, ';', '\'');
tokens.setTrimmerMatcher(StrMatcher.spaceMatcher());// <- add this line
To trim on: space, tab, newline and formfeed use StrMatcher.splitMatcher()