Is there a default/easy way in Java for split strings, but taking care of quotation marks or other symbols?
For example, given this text:
There's "a man" that live next door 'in my neighborhood', "and he gets me down..."
Obtain:
There's
a man
that
live
next
door
in my neighborhood
and he gets me down
Something like this works for your input:
String text = "There's \"a man\" that live next door "
+ "'in my neighborhood', \"and he gets me down...\"";
Scanner sc = new Scanner(text);
Pattern pattern = Pattern.compile(
"\"[^\"]*\"" +
"|'[^']*'" +
"|[A-Za-z']+"
);
String token;
while ((token = sc.findInLine(pattern)) != null) {
System.out.println("[" + token + "]");
}
The above prints (as seen on ideone.com):
[There's]
["a man"]
[that]
[live]
[next]
[door]
['in my neighborhood']
["and he gets me down..."]
It uses Scanner.findInLine
, where the regex pattern is one of:
"[^"]*" # double quoted token
'[^']*' # single quoted token
[A-Za-z']+ # everything else
No doubt this doesn't work 100% always; cases where quotes can be nested etc will be tricky.