Search code examples
rascal

Rascal ambiguity not resolved by disambiguation rules


I'm trying to get a disambiguation working, one in the same vein as the question I asked a few days ago. In that previous question, there was an undocumented limitation in the language implementation; I'm wondering if there's something similar going on here.

Tests [tuvw]1 are all throwing ambiguity exceptions (BTW: How do you catch those? [Edit: answered]). All of them look like they ought to pass. Note that they have to be unambiguous in order to pass. Neither the priority rule Scheme nor the reserve rules UnknownScheme[23] seem to be removing the ambiguity. There might be some interaction with follow rules I'm not understanding; it might be another limitation or a defect. What's up?

I'm on the unstable branch. Version (from Eclipse): 0.10.0.201806220838

EDIT. I modified the example code to more clearly highlight what's happening. I removed some redundant tests and the tests that were behaving correctly. I expanded some possibly-verbose diagnostics. I changed the exposition above to match. Newer results follow.

It looks like there are two different things at play here. "http" is being accepted (correctly) by both KnownScheme and UnknownScheme in tests s1[ab]. It seems to be behaving as if the priority declaration in Scheme just isn't functioning, as if > is being substituted with |.

In the other case, tests s1[cde] are failing, but s1f is passing. This looks even more like a defect. It's possible to reserve a single keyword, apparently, but not more than one. Since the various reservation declarations are failing, it's no surprise that there's an ambiguity when put into an alternative.

module ssce

import analysis::grammars::Ambiguity;
import IO;

lexical Scheme = AnyScheme ; 
lexical AnyScheme = KnownScheme > UnknownScheme ;
lexical AnySchemeChar = [a-z*];
lexical KnownScheme = KnownSchemes !>> AnySchemeChar ;  
lexical KnownSchemes = "http" | "https" | "http*" | "javascript" ;
lexical UnknownScheme = UnknownFixedScheme | UnknownWildScheme ;
lexical UnknownFixedScheme = [a-z]+ !>> AnySchemeChar ;
lexical UnknownWildScheme = [a-z]* '*' AnySchemeChar* !>> AnySchemeChar ;

lexical Scheme2 = UnknownScheme2 | KnownScheme ;
lexical UnknownScheme2 = UnknownScheme \ KnownSchemes ;
lexical Scheme3 = UnknownScheme3 | KnownScheme ;
lexical UnknownScheme3 = AnySchemeChar+ \ KnownSchemes ;
lexical Scheme4 = UnknownScheme4 | KnownScheme ;
lexical UnknownScheme4 = AnySchemeChar+ \ ("http"|"https") ;
lexical Scheme5 = UnknownScheme5 | KnownScheme ;
lexical UnknownScheme5 = AnySchemeChar+ \ "http" ;

test bool t1() { return parseAccept( #Scheme, "http" ); }
test bool u1() { return parseAccept( #Scheme2, "http" ); }
test bool v1() { return parseAccept( #Scheme3, "http" ); }
test bool w1() { return parseAccept( #Scheme4, "http" ); }
test bool x1() { return parseAccept( #Scheme5, "http" ); }
test bool s1a() { return parseAccept( #KnownScheme, "http" ); }
test bool s1b() { return parseAccept( #UnknownScheme, "http" ); }
test bool s1c() { return parseReject( #UnknownScheme2, "http" ); }
test bool s1d() { return parseReject( #UnknownScheme3, "http" ); }
test bool s1e() { return parseReject( #UnknownScheme4, "http" ); }
test bool s1f() { return parseReject( #UnknownScheme5, "http" ); }

bool verbose = false;

bool parseAccept( type[&T<:Tree] begin, str input )
{
    try
    {
        parse(begin, input, allowAmbiguity=false);
    }
    catch ParseError(loc _):
    {
        return false;
    }
    catch Ambiguity(loc l, str a, str b):
    {
        if (verbose)
        {
            println("[Ambiguity] " + a + ", " + b);
            Tree tt = parse(begin, input, allowAmbiguity=true) ;
            iprintln(tt);
            list[Message] m = diagnose(tt) ;
            println( ToString(m) );
        }
        fail;
    }
    return true;
}

bool parseReject( type[&T<:Tree] begin, str input )
{
    try
    {
        parse(begin, input, allowAmbiguity=false);
    }
    catch ParseError(loc _):
    {
        return true;
    }
    return false;
}

str ToString( list[Message] msgs ) =
    ( ToString( msgs[0] ) | it + "\n" + ToString(m) | m <- msgs[1..]  );

str ToString( Message msg)
{
    switch(msg)
    {
        case error(str s, loc _): return "error: " + s;
        case warning(str s, loc _): return "warning: " + s;
        case info(str s, loc _): return "info: " + s;
    }
    return "";
}

Solution

  • I've been making this ambiguity diagnostics tool, and here's what it came up with for your grammar. It seems you've discovered more things we need to document and write little checkers for.

    Well-formedness of \ is murky.

    The problem is that the \ operator only accepts literal strings, such as A \ "a" \ "b" or a keyword non-terminal defined like keyword Hello = "a" | "b";, used as A \ Hello, and nothing else. So also A \ ("a" | "b") is not allowed, and also indirect non-terminals like A \ Hello where lexical Hello = Bye; lexical Bye = "if" | "then"; also not allowed. Only the simplest of the simplest forms.

    Well-formedness of follow-restrictions

    Similar rules for !>> disallow any non-terminal to the right of the !>> operator.

    So [a-z]+ !>> [a-z] or [a-z]+ !>> "*", but not [a-z]+ \ myCharClass where lexical myCharClass = [a-z];

    Names for character-classes is on our todoy list; but they will not be like non-terminals. More like aliases which will be substituted at parser generator time.

    Whole words

    Keyword reservation only works if you subtract the sentence from the whole word. Sometimes you have to group non-terminals to get this right:

    • lexical Ex = ([a-z]+ "*") \ "https*" instead of lexical Ex = [a-z]+ "*" \ "https*")

    The latter would try to subtract the "https*" language from the "*" language. The first works.

    case-insensitivity

    • 'if' is defined by lexical 'if' = [iI][fF];
    • "if" is defined by lexical "if" = [i][f];
    • '*' is defined by lexical '*' = [*];
    • "*" is defined by lexical "*" = [*];

    New grammar

    I used a random generator to generate all the ambiguities I could find, and resolved them step by step by adding keyword reservation:

    lexical Scheme = AnyScheme ; 
    lexical AnyScheme = KnownScheme > UnknownScheme ;
    lexical AnySchemeChar = [a-z*];
    lexical KnownScheme = KnownSchemes !>> AnySchemeChar ;  
    keyword KnownSchemes = "http" | "https" | "http*" | "javascript" ;
    lexical UnknownScheme = UnknownFixedScheme | UnknownWildScheme ;
    lexical UnknownFixedScheme = [a-z]+ !>> AnySchemeChar \ KnownSchemes ;
    lexical UnknownWildScheme = ([a-z]* '*' AnySchemeChar*) !>> AnySchemeChar  \ KnownSchemes ;