Say, I have the following EBNF:
document = content , { content } ;
content = hello world | answer | space ;
hello world = "hello" , space , "world" ;
answer = "42" ;
space = " " ;
This lets me parse something like:
hello world 42
Now I want to extend this grammar with a block comment. How can I do this properly?
If I start simple:
document = content , { content } ;
content = hello world | answer | space | comment;
hello world = "hello" , space , "world" ;
answer = "42" ;
space = " " ;
comment = "/*" , ?any character? , "*/" ;
I cannot parse:
Hello /* I'm the taxman! */ World 42
If I extend the grammar further with the special case from above, it gets ugly, but parses.
document = content , { content } ;
content = hello world | answer | space | comment;
hello world = "hello" , { comment } , space , { comment } , "world" ;
answer = "42" ;
space = " " ;
comment = "/*" , ?any character? , "*/" ;
But I still cannot parse something like:
Hel/*p! I need somebody. Help! Not just anybody... */lo World 42
How would I do this with an EBNF grammar? Or is it not even possible at all?
Assuming you would consider "hello" as a token, you would not want anything to break that up. Should you need to do so, it becomes necessary to explode the rule:
hello_world = "h", {comment}, "e", {comment}, "l", {comment}, "l", {comment}, "o" ,
{ comment }, space, { comment },
"w", {comment}, "o", {comment}, "r", {comment}, "l", {comment}, "d" ;
Considering the broader question, it seems commonplace to not describe language comments as part of the formal grammar, but to instead make it a side note. However, it can generally be done by treating the comment as equivalent to whitespace:
space = " " | comment ;
You may also want to consider adding a rule to describe consecutive whitespace:
spaces = { space }- ;
Cleaning up your final grammar, but treating "hello" and "world" as tokens (i.e. not allowing them to be broken apart), could result in something like this:
document = { content }- ;
content = hello world | answer | space ;
hello world = "hello" , spaces , "world" ;
answer = "42" ;
spaces = { space }- ;
space = " " | comment ;
comment = "/*" , ?any character? , "*/" ;