Search code examples
regexbig-oebnf

EBNF for capturing a comma between two optional values


I have two optional values, and when both are present, a comma needs to be in between them. If one or both values are present, there may be a trailing comma, but if no values are present, no comma is allowed.

Valid examples:

(first,second,)
(first,second)
(first,)
(first)
(second,)
(second)
()

Invalid examples:

(first,first,)
(first,first)
(second,second,)
(second,second)
(second,first,)
(second,first)
(,first,second,)
(,first,second)
(,first,)
(,first)
(,second,)
(,second)
(,)
(,first,first,)
(,first,first)
(,second,second,)
(,second,second)
(,second,first,)
(,second,first)

I have EBNF code (XML-flavored) that suffices, but is there a way I can simplify it? I would like to make it more readable / less repetitive.

tuple ::= "(" ( ( "first" | "second" | "first" "," "second" ) ","? )? ")"

If it’s easier to understand in regex, here’s the equivalent code, but I need a solution in EBNF.

/\(((first|second|first\,second)\,?)?\)/

And here’s a helpful railroad diagram: "(" ( ( "first" | "second" | "first" "," "second" ) ","? )? ")"

This question becomes even more complex when we abstract it to three terms: "first", "second", and "third" are all optional, but they must appear in that order, separated by commas, with an optional trailing comma. The best I can come up with is a brute-force method:

"(" (("first" | "second" | "third" | "first" "," "second" | "first" "," "third" | "second" "," "third" | "first" "," "second" "," "third") ","?)? ")"

"(" (("first" | "second" | "third" | "first" "," "second" | "first" "," "third" | "second" "," "third" | "first" "," "second" "," "third") ","?)? ")" Clearly, a solution involving O(2n) complexity is not very desirable.


Solution

  • I found a way to simplify it, but not by much:

    "(" ( ("first" ("," "second")? | "second") ","? )? ")"
    

    "(" ( ("first" ("," "second")? | "second") ","? )? ")"

    For the three-term solution, take the two-term solution and prepend a first term:

    "(" (("first" ("," ("second" ("," "third")? | "third"))? | "second" ("," "third")? | "third") ","?)? ")"
    

    "(" (("first" ("," ("second" ("," "third")? | "third"))? | "second" ("," "third")? | "third") ","?)? ")"

    For any (n+1)-term solution, take the n-term solution and prepend a first term. This complexity is O(n), which is significantly better than O(2n).