Search code examples
regexparsingsentence

Regex for parsing a simple sentence words delimited by double quotes


I have an example sentence that looks like this:

""Music"",""EDM / Electronic"",""organizer: Tiny Toons""

I want to parse this sentence into the tokens:

["Music", "EDM / Electronic", "organizer: Tiny Toons"]

My regex foo is quite limited, and I'm under some time pressure.

Was wondering if someone could help me construct a regex (compatible with Java8 - as I'm using Clojure to apply the regex) to parse out these capture groups.

Thank you, Jason.


Solution

  • Assuming the sentence is the entire string and that there are no commas or " to be matched, you could just use

    "[^,\"]+"
    

    If the above assumptions are not correct, please give examples of possible input strings and details of what characters can appear within the sections you want to match.

    A simple java example of how to use the regex:

    String sentence = "\"\"Music\"\",\"\"EDM / Electronic\"\",\"\"organizer: Tiny Toons\"\"";
    Matcher matcher = Pattern.compile("[^,\"]+").matcher(sentence);
    List<String> matches = new ArrayList<String>();
    while (matcher.find()) {
        matches.add(matcher.group());
    }
    System.out.println(matches);