Search code examples
antlrantlr4ebnf

How to use Space in a simple grammar


I'm beginner at both antlr and ebnf.

I have the following grammar expressed in antlr4:

grammar RecordGrammar;

Record: 'record';
EndRecord: 'endrecord';

Track: 'track';
EndTrack: 'endtrack';

Length: 'length';

Name: [a-zA-Z]+;
Number: [0-9]+;
WS: [ \t\r\n]+;

records: (record)+ EOF;

record: Record WS Name WS
            tracks WS?
        EndRecord WS?;

tracks: track WS? (track WS)*;

track: Track WS
          length
       EndTrack WS?;

length: Length WS Number WS?;

When I use the grammar above (with antlr) on the this text:

record help
    track
     length 2
    endtrack
    track
       length 4
    endtrack
    track
       length 42
    endtrack
endrecord

...it works nice and dandy.

But I want to extend the 'Name' rule in the EBNF to also accept Space.

So I want the grammar to accept this text file also:

record help me
    track
     length 2
    endtrack
    track
       length 4
    endtrack
    track
       length 42
    endtrack
endrecord

Observe the text "help me" on the right side of the record tag.

How can I achieve this in the grammar? Since Space is a natural delimiter, I need some kind special treatment for that in my rules. Thanks for all help I can get...


Solution

  • You could create a name parser rule that matches multiple Name tokens:

    name : Name (WS+ Name)*;
    

    But since you're not really doing anything with spaces, mind as well discard them during tokenisation by adding -> skip to it and then removing all WS from your parser rules:

    grammar RecordGrammar;
    
    records     : record+ EOF;
    record      : Record name tracks EndRecord;
    tracks      : track+;
    track       : Track length EndTrack;
    length      : Length Number;
    name        : Name+;
    
    Record      : 'record';
    EndRecord   : 'endrecord';
    Track       : 'track';
    EndTrack    : 'endtrack';
    Length      : 'length';
    Name        : [a-zA-Z]+;
    Number      : [0-9]+;
    WS          : [ \t\r\n]+ -> skip;
    

    which will result in the following parse tree:

    enter image description here