Search code examples
parsingyaccparser-generatorlalr

How to test grammar changes in yacc/lalr(1) for backwards compatibilitiy?


We have our scripting language that is already in use and we are improving it by adding new features etc.

My question is; What is the best way to test our grammer (not end app) for backwards compatibility? Anybody knows a tool that makes it for us or a way to do it?

Best regards


Solution

  • Testing whether one grammar accepts the same language as another, or accepts a larger language, is difficult if not impossible in theory.

    As engineers, we are often asked to do the impossible. What we do is relax the requirements until we get some kind of useful answer. One way to relax the requirements is to allow a tool to say "don't know" in some cases.

    My company builds parsers based on parser generators. Often we deal with huge grammars (thousands of rules). One of the mechanisms we have been working on is to detect if a grammar is ambiguous. This is already known to be impossible in theory, but doesn't change our interest in getting an answer.

    We are working on a tool that can answer this question in many cases. In effect what it does is ask if any nonterminal is ambiguous; applied to the root grammar rule this directly asks if the grammar is ambiguous. The reason for trying it on all nonterminals is that many of them produce smaller sublanguages than the full language, allowing them to be analyzed. It determines this by doing a breadth-first search on expansions of the nonterminal using grammar rules to expand. One of several things occurs during this search:

    • the nonterminal is shown to be not-ambiguous
    • it is shown to be ambiguous, which implies the main grammar rule is ambiguous
    • the search runs out of a time we are willing to spend on it.

    By recording the search results, and iterating multiple times (using a search technique called "iterative deepening") over all the grammar rules, we often can find ambiguities, and/or prove that parts of the grammar are not ambiguous. (Having cached the fact that a terminal/nonterminal X is or is not ambiguous allows checking other nonterminals Y that transitively use X, to search effectively "faster" or "deeper"; this is the old trick of transposition tables in chess programs). This answer isn't perfect, but when it identifies an ambiguity there really is one, and when it claims there isn't one, there isn't one. That's a big help.

    It seems to me the same type of search should be applicable to asking if one grammar accepts a superset of another, under the assumption that the grammars are fairly similar (e.g., you got one by modifying the other "slightly"). The search has to check for each nonterminal shared by the languages if that nonterminal is a superset of its twin. Again, any answer you get would not be perfect, but it might go a long way in providing faith in the compatibility.

    The only other way we know how do to this is to run a very large set of comprehensive tests, as @EJP has noted.