Search code examples
antlrantlr4cs

Problems matching text and containing syntax elements


I've written a small portion of a combined ANTLR4 grammar:

grammar TestCombined;

NL
    : [\r\n]
    ;

SUBHEADLINE
    : '##' .*? '##'
    ;

HEADLINE
    : '#' .*? '#'
    ;

LEAD
    : '###' .*? '###'
    ;

SUBHEADING
    : '####' .*? '####'
    ;

TEXT
    : .+?
    ;

/* ---- */

dnpMD
    : subheadline headline lead bodyElements*
    ;

subheadline
    : SUBHEADLINE NL NL
    ;

headline
    : HEADLINE NL NL
    ;

lead
    : LEAD NL NL
    ;

subheading
    : SUBHEADING
    ;

bodyElements
    : TEXT
    | subheading
    ;

The first three headline types are working extremely well. Thanks to another question (and the answer) this is way clearer to me than before.

But I've problems understanding, why the TEXT rule/token is not getting matched correctly. I'm new to ANTLR4 and I think I'm missing something very important that hampers me of understanding the underlying problem.

This is an example input:

## Test ##

# Test123 #

### Test1234 ###

#### Another Test ####

this is not getting recognized.

What am I missing? Is it impossible to write those things in/with ANTLR4? The text could possibly contain more syntax elements like italic and stuff like that.


Solution

  • The current solution looks like this lexer and grammar rules:

    lexer grammar dnpMDAuslagernLexer;
    
    /*@members {
        public static final int COMMENTS = 1;
    }*/
    
    NL
        : [\r\n]
        ;
    
    SUBHEADLINE
        : '##' (~[\r\n])+? '##'
        ;
    
    HEADLINE
        : '#' ('\\#'|~[\r\n])+? '#'
        ;
    
    LEAD
        : '###' (~[\r\n])+? '###'
        ;
    
    SUBHEADING
        : '####' (~[\r\n])+? '####'
        ;
    
    CAPTION
        : '#####' (~[\r\n])+? '#####'
        ;
    
    LISTING
        : '~~~~~' .+? '~~~~~'
        ;
    
    ELEMENTPATH
        : '[[[[[' (~[\r\n])+? ']]]]]'
        ;
    
    LABELREF
        : '{##' (~[\r\n])+? '##}'
        ;
    
    LABEL
        : '{#' (~[\r\n])+? '#}'
        ;
    
    ITALIC
        : '*' (~[\r\n])+? '*'
        ;
    
    SINGLE_COMMENT
        : '//' (~[\r\n])+ -> channel(1)
        ;
    
    MULTI_COMMENT
        : '/*' .*? '*/' -> channel(1)
        ;
    
    STAR
        : '*'
        ;
    
    BRACE_OPEN
        : '{'
        ;
    
    TEXT
        : (~[\r\n*{])+
        ;
    
    parser grammar dnpMDAuslagernParser;
    
    options { tokenVocab=dnpMDAuslagernLexer; }
    
    dnpMD
        : head body
        ;
    
    head
        : subheadline headline lead
        ;
    
    subheadline
        : SUBHEADLINE NL+
        ;
    
    headline
        : HEADLINE NL+
        ;
    
    lead
        : LEAD
        ;
    
    subheading
        : SUBHEADING
        ;
    
    caption
        : CAPTION
        ;
    
    listing
        : LISTING (NL listingPath)? (NL label)? NL caption
        ;
    
    image
        : caption (NL label)? (NL imagePath)?
        ;
    
    listingPath
        : ELEMENTPATH
        ;
    
    imagePath
        : ELEMENTPATH
        ;
    
    labelRef
        : LABELREF
        ;
    
    label
        : LABEL
        ;
    
    italic
        : ITALIC
        ;
    
    singleComment
        : SINGLE_COMMENT
        ;
    
    multiComment
        : MULTI_COMMENT
        ;
    
    paragraph
        : TEXT? italic TEXT?
        | TEXT? STAR TEXT?
        | TEXT? labelRef TEXT?
        | TEXT? BRACE_OPEN TEXT?
        | TEXT? LABEL TEXT?
        | ELEMENTPATH
        | TEXT
        ;
    
    newlines
        : NL+
        ;
    
    body
        : bodyElements+
        ;
    
    bodyElements
        : singleComment
        | multiComment
        | paragraph
        | subheading
        | listing
        | image
        | newlines
        ;
    

    This language is working fine and maybe someone can benefit from it.

    Thanks to all who helped out! Fabian