Search code examples
perlparsingambiguous-grammarmarpa

How do I make Marpa's sequence rules greedy?


I am working on a Marpa::R2 grammar that groups items in a text. Each group can only contain items of a certain kind, but is not explicitly delimited. This causes problems, because x...x (where . represents an item that can be part of a group) can be grouped as x(...)x, x(..)(.)x, x(.)(..)x, x(.)(.)(.)x. In other words, the grammar is highly ambiguous.

How can I remove this ambiguity if I only want the x(...)x parse, i.e. if I want to force a + quantifier to only behave “greedy” (as it does in Perl regexes)?

In the below grammar, I tried adding rank adverbs to the sequence rules in order to prioritize Group over Sequence, but that doesn't seem to work.

Below is a test case that exercises this behaviour.

use strict;
use warnings;

use Marpa::R2;
use Test::More;

my $grammar_source = <<'END_GRAMMAR';
inaccessible is fatal by default
:discard ~ space
:start ::= Sequence

Sequence
    ::= SequenceItem+  action => ::array
SequenceItem
    ::= WORD    action => ::first
    |   Group   action => ::first
Group
    ::= GroupItem+  action => [name, values]
GroupItem
    ::= ('[') Sequence (']')  action => ::first

WORD    ~ [a-z]+
space   ~ [\s]+
END_GRAMMAR

my $input = "foo [a] [b] bar";

diag "perl $^V";
diag "Marpa::R2 " . Marpa::R2->VERSION;

my $grammar = Marpa::R2::Scanless::G->new({ source => \$grammar_source });
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });

$recce->read(\$input);

my $parse_count = 0;
while (my $value = $recce->value) {
    is_deeply $$value, ['foo', [Group => ['a'], ['b']], 'bar'], 'expected structure'
        or diag explain $$value;
    $parse_count++;
}
is $parse_count, 1, 'expected number of parses';

done_testing;

Output of the test case (FAIL):

# perl v5.18.2
# Marpa::R2 2.09
ok 1 - expected structure
not ok 2 - expected structure
#   Failed test 'expected structure'
#   at - line 38.
#     Structures begin differing at:
#          $got->[1][2] = Does not exist
#     $expected->[1][2] = ARRAY(0x981bd68)
# [
#   'foo',
#   [
#     'Group',
#     [
#       'a'
#     ]
#   ],
#   [
#     ${\$VAR1->[1][0]},
#     [
#       'b'
#     ]
#   ],
#   'bar'
# ]
not ok 3 - expected number of parses
#   Failed test 'expected number of parses'
#   at - line 41.
#          got: '2'
#     expected: '1'
1..3
# Looks like you failed 2 tests of 3.

Solution

  • Sequence rules are designed for non-tricky cases. Sequence rules can always be rewritten as BNF rules when the going gets tricky, and that is what I suggest here. The following makes your test work:

    use strict;
    use warnings;
    
    use Marpa::R2;
    use Test::More;
    
    my $grammar_source = <<'END_GRAMMAR';
    inaccessible is fatal by default
    :discard ~ space
    
    # Three cases
    # 1.) Just one group.
    # 2.) Group follows by alternating words and groups.
    # 3.) Alternating words and groups, starting with words
    Sequence ::= Group action => ::first
    Sequence ::= Group Subsequence action => [values]
    Sequence ::= Subsequence action => ::first
    
    Subsequence ::= Words action => ::first
    
    # "action => [values]" makes the test work unchanged.
    # The action for the next rule probably should be
    # action => [name, values] in order to handle the general case.
    Subsequence ::= Subsequence Group Words action => [values]
    
    Words ::= WORD+ action => ::first
    Group
    ::= GroupItem+  action => [name, values]
    GroupItem
    ::= ('[') Sequence (']')  action => [value]
    
    WORD    ~ [a-z]+
    space   ~ [\s]+
    END_GRAMMAR
    
    my $input = "foo [a] [b] bar";
    
    diag "perl $^V";
    diag "Marpa::R2 " . Marpa::R2->VERSION;
    
    my $grammar = Marpa::R2::Scanless::G->new( { source  => \$grammar_source } );
    my $recce   = Marpa::R2::Scanless::R->new( { grammar => $grammar } );
    
    $recce->read( \$input );
    
    my $parse_count = 0;
    while ( my $value = $recce->value ) {
    is_deeply $$value, [ 'foo', [ Group => ['a'], ['b'] ], 'bar' ],
        'expected structure'
        or diag explain $$value;
    $parse_count++;
    } ## end while ( my $value = $recce->value )
    is $parse_count, 1, 'expected number of parses';
    
    done_testing;