Search code examples
perlparsingtokenizemarpa

Trouble separating G0 and G1 rules in grammar


I'm trying to get what seems like a very basic Marpa grammar working. The code I use is below:

use strict;
use warnings;
use Marpa::R2;
use Data::Dumper;
my $grammar = Marpa::R2::Scanless::G->new(
    {
        source => \(<<'END_OF_SOURCE'),
            :start ::= ExprSingle
            ExprSingle ::= Expr AndExpr

            Expr ~ word

            AndExpr ~ word*
            word ~ [\w]+

            :discard ~ ws
            ws ~ [\s]+
END_OF_SOURCE
    }
);
my $reader = Marpa::R2::Scanless::R->new(
    {
        grammar => $grammar,
    }
);
my $input = 'foo';
$reader->read(\$input);
my $value = $reader->value;
print Dumper $value;

This prints $VAR1 = \'foo';. So it recognizes one word just fine. But I want it to recognize a string of words

my $input='foo bar'

Now the script prints:

Error in SLIF G1 read: Parse exhausted, but lexemes remain, at position 4

I think this is because ExprSingle uses the ~ (match) operator, which makes it part of the tokenizing level, G0, instead of the structural level G1; the :discard rule allows space between G1 rules, not G0 ones. So I change the grammar like so:

ExprSingle ::= Expr AndExpr

Now no warning is printed, but the resulting value is undef instead of something containing 'foo' and 'bar'. I'm honestly not sure what that means, since, before, the failed parse threw an actual error.

I tried changing the grammar to separate what I think are G0 and G1 rules further, but still no luck:

:start ::= ExprSingle
ExprSingle ::= Expr AndExpr

Expr ::= token

AndExpr ::= token*
token ~ word
word ~ [\w]+

:discard ~ ws
ws ~ [\s]+

The final value is still undef. trace_terminals shows both 'foo' and 'bar' being accepted as tokens. What do I need to do to fix this grammar (by which I mean get a value containing the strings 'foo' and 'bar' instead of just undef)?


Solution

  • Rules by default return a value of undef, so in your case a return of \undef from $reader->value() means your parse succeeded. That is, a return of undef means failure, while a return of \undef means success where the parse evaluated to undef.

    A good, fast way to start with a more helpful semantics is to add the following line:

    :default ::= action => ::array

    This causes the parse to generate an AST.