Search code examples
javaregexstringwords

Regex-ing Words, Numbers, And Quotations from a string in Java


I have a quick question about Regex in Java (though other languages are probably similar).

What I'm trying to do is to transform a String like this:

 How are you "Doing well" How well 10 "That's great"

//# I want the Regex in Java to match out all of the words, numbers, 
//# and things inside quotation marks. Ideally, I'd get something like this 

How
Are
You
"Doing Well"
How 
Well
10
"That's Great!"

The Regex I'm trying to use is the following:

String RegexPattern =   "[^"+           //  START_OR: start of line OR" 
                        "\\s" +         //  empty space OR
                        "(\\s*?<=\")]" + // ENDOR: preceeded by 0 or more spaces and a quotation mark 
                        "(\\w+)" +      // the actual word or number
                        "[\\s" +        // START_OR: followed by a space OR
                        "(?=\")" +      // followed by a quotation mark OR
                        "$]";           // ENDOF:  end of line

This Won't work for me, though; even for much simpler strings! I've spent a lot of time looking for similar problems on here. If I didn't need the quotations, I could just use a split; eventually, though, this pattern will get much more complicated, so I will need to use the Regex (this is just the first iteration).

I'd appreciate any help; thanks in advance!


Solution

  • I don't think [ ] means what you think it means. Inside square brackets, ^ is actually a negation operator for the character class. You should practice with smaller regexes before embarking on this task. The pattern you're looking for is more like:

        \s*([^"\s]+|"[^"]*")
    

    You can see this in action here: http://rubular.com/r/enq7eXg9Zm.

    If you don't want symbols in words, then it's probably best to use a second regex that removes them, e.g.

        \W