Search code examples
parsingantlrgrammarparser-generatorantlrworks

Parsing with incomplete grammars


Are there any common solutions how to use incomplete grammars? In my case I just want to detect methods in Delphi (Pascal)-files, that means procedures and functions. The following first attempt is working

    methods
      : ( procedure | function | . )+
      ;

but is that a solution at all? Are there any better solutions? Is it possible to stop parsing with an action (e. g. after detecting implementation). Does it make sense to use a preprocessor? And when yes - how?


Solution

  • If you're only looking for names, then something as simple as this:

    grammar PascalFuncProc;
    
    parse
      :  (Procedure | Function)* EOF
      ;
    
    Procedure
      :  'procedure' Spaces Identifier
      ;
    
    Function
      :  'function' Spaces Identifier
      ;
    
    Ignore
      :  (StrLiteral | Comment | .) {skip();}
      ;
    
    fragment Spaces     : (' ' | '\t' | '\r' | '\n')+;
    fragment Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
    fragment StrLiteral : '\'' ~'\''* '\'';
    fragment Comment    : '{' ~'}'* '}';
    

    will do the trick. Note that I am not very familiar with Delhpi/Pascal, so I am surely goofing up StrLiterals and/or Comments, but that'll be easily fixed.

    The lexer generated from the grammar above will only produce two type of tokens (Procedures and Functions), the rest of the input (string literals, comments or if nothing is matched, a single character: the .) is being discarded from the lexer immediately (the skip() method).

    For input like this:

    some valid source
    { 
      function NotAFunction ...
    }
    
    procedure Proc
    Begin
      ...
    End;
    
    procedure Func
    Begin
      s = 'function NotAFunction!!!'
    End;
    

    the following parse tree is created:

    enter image description here