I've got a problem with using a reserve (backslash) declaration for priority disambiguation. Below is a self-contained example. The production 'Ipv4Address' is a strict subset of 'Domain0'. In parsing URL's, though, you want dotted-quad addresses to be handled differently than domain names, so you want to split 'Domain0' into two parts; 'Domain1' is one of those two parts. The test suite included, however, is failing at 't3()', where 'Domain1' is accepting an IP address, which looks like it should be excluded.
Is this a problem with the reserve declaration, or is this a defect in the current version of Rascal? I'm on the 0.10.x unstable branch at present, per advice to see if that corrected a different problem (with the Tutor). I haven't checked with the stable branch since keeping them both installed means parallel Eclipse environments, which I haven't been motivated to do.
module grammar_test
import ParseTree;
syntax Domain0 = { Subdomain '.' }+;
syntax Domain1 = Domain0 \ IPv4Address ;
lexical Subdomain = [0-9A-Za-z]+ | [0-9A-Za-z]+'-'[a-zA-Z0-9\-]*[a-zA-Z0-9] ;
lexical IPv4Address = DecimalOctet '.' DecimalOctet '.' DecimalOctet '.' DecimalOctet ;
lexical DecimalOctet = [0-9] | [1-9][0-9] | '1'[0-9][0-9] | '2'[0-4][0-9] | '25'[0-5] ;
test bool t1()
{
return parseAccept(#IPv4Address, "192.168.0.1");
}
test bool t2()
{
return parseAccept(#Domain0, "192.168.0.1");
}
test bool t3()
{
return parseReject(#Domain1, "192.168.0.1");
}
bool parseAccept( type[&T<:Tree] begin, str input )
{
try
{
parse(begin, input, allowAmbiguity=false);
}
catch ParseError(loc _):
{
return false;
}
return true;
}
bool parseReject( type[&T<:Tree] begin, str input )
{
try
{
parse(begin, input, allowAmbiguity=false);
}
catch ParseError(loc _):
{
return true;
}
return false;
}
This example has been cut down from larger code. I first encountered the error in a larger scope. Using the rule "IPv4Address | Domain1" was throwing an Ambiguity exception, which I tracked down to the behavior that "Domain1" was accepting something it shouldn't be. Curiously "IPv4Address > Domain1" was also throwing Ambiguity, but I'm guessing this has the same root cause as the present isolated example.
The difference operator for keyword reservations currently only works correctly if the right-hand side is a finite language expressed as disjunction of literal keywords like "if" | "then" | "while"
or a non-terminal which is defined like that: lexical X = "if" | "then" | "while". And then you can write
A \ X` for some effect.
For other types of non-terminals the parser is just generated but the \
constraint has no effect. You wrote Domain0 \ IPv4Address
and IPv3Address does not hold to the above assumption.
(We should either add a warning about that or generate a parser which can implement the full semantics of language difference; but that's for another time).
Admittedly such a powerful difference operator could be used to express an some order of preference between non-terminals. Alas.
Possible (sketches of) solutions:
Subdomain
syntax, then pattern and match rewrite in a single pass all quadruples to IPv4Address{Subdomain !>> [.][0-9] "."}+
or something in that vain.