Search code examples
javaregexgroovybasic

Regular expression to match a comma not inside a string literal


In BASIC print statements can look like this:

100 PRINT "Copyright, Adrian McMenamin","maybe"

Which should put a tab space between the first statement and the second.

I am working on a DSL/interpreter for BASIC in Groovy/Java which needs to parse this line and produce something like:

print "Copyright, Adrian McMenamin", "    ","maybe"

(As Groovy interprets the comma as merely separating the parameters of the print function.)

So what regular expression will differentiate between the commas not inside the quote and those that are (don't worry about the PRINT or the line number etc)?


Solution

  • I implemented a very simple parser that just counted open quotes (modulo 2). It works given the simple rules for a BASIC string literal. I had designed a very beautiful recursive function employing a regex that would work correctly for this form:

    100 PRINT "Copyright, 2012", "Adrian McMenamin"
    

    But which failed for this:

    100 PRINT "Copyright, 2012"
    

    Oh well, guess that's just the limit of a DFA-like scheme.