Search code examples
regexperlmarpa

How to parse/identify double quoted string from the big expression using MARPA:R2 perl


Problem in parsing/identifying double quoted string from the big expression.

use strict;
use Marpa::R2;
use Data::Dumper;

my $grammar = Marpa::R2::Scanless::G->new({
    default_action => '[values]',
    source => \(<<'END_OF_SOURCE'),

:start ::= expression

expression ::= expression OP expression
expression ::= expression COMMA expression
expression ::= func LPAREN PARAM RPAREN
expression ::= PARAM
PARAM ::= STRING | REGEX_STRING

:discard    ~ sp
sp          ~ [\s]+

COMMA                      ~ [,]
STRING                     ~ [^ \/\(\),&:\"~]+
REGEX_STRING               ~ yet to identify
OP                         ~ ' - ' | '&'
LPAREN                     ~ '('
RPAREN                     ~ ')'
func                       ~ 'func'

END_OF_SOURCE
});

my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});

my $input1 = "func(foo)&func(bar)"; -> able to parse it properly by parsing foo and bar as STRING LEXEME.

my $input2 = "\"foo\""; -> Here, I want to parse foo as regex_string LEXEME. REGEX_STRING is something which is enclosed in double quotes.

my $input3 = "func(\"foo\") - func(\"bar\")"; -> Here, func should be taken as func LEXEME, ( should be LPAREN, ) should be RPAREN, foo as REGEX_STRING, - as OP and same for func(\"bar\")

my $input4 = "func(\"foo\")"; -> Here, func should be taken as func LEXEME, ( should be LPAREN, ) should be RPAREN, foo as REGEX_STRING

print "Trying to parse:\n$input\n\n";
$recce->read(\$input);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);

What did i try : 1st method: My REGEX_STRING should be something : REGEX_STRING -> ~ '\"([^:]*?)\"'

If i try putting above REGEX_STRING in the code with input expression as my $input4 = "func(\"foo\")"; i get error like :

Error in SLIF parse: No lexeme found at line 1, column 5 * String before error: func( * The error was at line 1, column 5, and at character 0x0022 '"', ... * here: "foo") Marpa::R2 exception

2nd method:

Tried including a rule like :

PARAM ::= STRING | REGEX_STRING
REGEX_STRING ::= '"' QUOTED_STRING '"'

STRING ~ [^ \/\(\),&:\"~]+
QUOTED_STRING ~ [^ ,&:\"~]+

The problem here is-> Input is given using:

my $input4 = "func(\"foo\")";

So, here it gives error because there are now two ways to parse this expression, either whole thing between double quotes which is func(\"foo\") is taken as QUOTED_STRING or func should be taken as func LEXEME and so on.

Please help how do i fix this thing.


Solution

  • use 5.026;
    use strictures;
    use Data::Dumper qw(Dumper);
    use Marpa::R2 qw();
    
    my $grammar = Marpa::R2::Scanless::G->new({
        bless_package => 'parsetree',
        source        => \<<'',
    :default ::= action => [values] bless => ::lhs
    lexeme default = bless => ::name latm => 1
    :start ::= expression
    expression ::= expression OP expression
    expression ::= expression COMMA expression
    expression ::= func LPAREN PARAM RPAREN
    expression ::= PARAM
    PARAM ::= STRING | REGEXSTRING
    :discard    ~ sp
    sp          ~ [\s]+
    COMMA           ~ [,]
    STRING          ~ [^ \/\(\),&:\"~]+
    REGEXSTRING     ::= '"' QUOTEDSTRING '"'
    QUOTEDSTRING    ~ [^ ,&:\"~]+
    OP              ~ ' - ' | '&'
    LPAREN          ~ '('
    RPAREN          ~ ')'
    func            ~ 'func'
    
    });
    # say $grammar->show_rules;
    
    for my $input (
        'func(foo)&func(bar)', '"foo"', 'func("foo") - func("bar")', 'func("foo")'
    ) {
        my $r = Marpa::R2::Scanless::R->new({
            grammar => $grammar,
    #         trace_terminals => 1
        });
        $r->read(\$input);
        say "# $input";
        say Dumper $r->value;
    }