So, I have a simple string that looks like this:
word1 word2! word3? word4; word5, word6
word7 //new line
!word8! word9 word10 word11 word12
And my desire is to split this string with saving whitespace and new line delimiters.
Right now I'm using a s.split() method with [\\s\\r\\n]
expression as its argument and the output is:
[word1, word2!, word3?, word4;, word5,, word6, , word7, , !word8!, word9, word10, word11, word12]
And I'm okay with a whitespaces not being saved. But what can I do with a \n
being saved just as a whitespace?
UPD: I pass this string through RabbitMQ query. In Java it will look like this:
"word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12"
You can extract the whitespace and non-whitespace strings (and basically, tokenize the text into whitespace and non-whitespace text chunks) using the \S+|\s+
regex.
See the Java demo:
import java.util.*;
import java.util.regex.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String line = "word1 word2! word3? word4; word5, word6\nword7\n!word8! word9 word10 word11 word12";
Pattern p = Pattern.compile("\\S+|\\s+");
Matcher m = p.matcher(line);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group());
}
System.out.println(res);
}
}
Output:
[word1, , word2!, , word3?, , word4;, , word5,, , word6,
, word7,
, !word8!, , word9, , word10, , word11, , word12]
where the line breaks are literal line break chars.