I have a quick question about Regex in Java (though other languages are probably similar).
What I'm trying to do is to transform a String like this:
How are you "Doing well" How well 10 "That's great"
//# I want the Regex in Java to match out all of the words, numbers,
//# and things inside quotation marks. Ideally, I'd get something like this
How
Are
You
"Doing Well"
How
Well
10
"That's Great!"
The Regex I'm trying to use is the following:
String RegexPattern = "[^"+ // START_OR: start of line OR"
"\\s" + // empty space OR
"(\\s*?<=\")]" + // ENDOR: preceeded by 0 or more spaces and a quotation mark
"(\\w+)" + // the actual word or number
"[\\s" + // START_OR: followed by a space OR
"(?=\")" + // followed by a quotation mark OR
"$]"; // ENDOF: end of line
This Won't work for me, though; even for much simpler strings! I've spent a lot of time looking for similar problems on here. If I didn't need the quotations, I could just use a split; eventually, though, this pattern will get much more complicated, so I will need to use the Regex (this is just the first iteration).
I'd appreciate any help; thanks in advance!
I don't think [ ]
means what you think it means. Inside square brackets, ^
is actually a negation operator for the character class. You should practice with smaller regexes before embarking on this task. The pattern you're looking for is more like:
\s*([^"\s]+|"[^"]*")
You can see this in action here: http://rubular.com/r/enq7eXg9Zm.
If you don't want symbols in words, then it's probably best to use a second regex that removes them, e.g.
\W