Search code examples
lexbnfmarpa

How to identify and extract simple nested tokens with a BNF lexer?


I have no idea how to get documentation about this. I just discovered that most of the compilers are using the Backus–Naur Form to describe a language.

From the Marpa::R2 perl package, get this simple example that parse arithmetic strings such as 42 * 1 + 7:

:default ::= action => [name,values]
lexeme default = latm => 1

Calculator ::= Expression action => ::first

Factor ::= Number action => ::first
Term ::=
    Term '*' Factor action => do_multiply
    | Factor action => ::first
Expression ::=
    Expression '+' Term action => do_add
    | Term action => ::first
Number ~ digits
digits ~ [\d]+
:discard ~ whitespace
whitespace ~ [\s]+   

I would like to modify this in order to recursively parse an XML like sample such as:

<foo>
    Some content here 
    <bar>
        I am nested into foo
    </bar>
    A nested block was before me.
</foo> 

And express it into something like:

>(Some content here)
>>(I am nested into foo)
>(A nested block was before me)

Where I may use this function:

sub block($content, $level) {
    for each $content line
        $line = (">" x $level).$content
    return $content
}

Was would be a good start for me?


Solution

  • There is an open-source Marpa-powered XML parser.