Search code examples
compiler-constructiongrammarcontext-free-grammarfinite-automata

How to extract the grammar from compiler


Currently we are working on the project Software Modernization here we are unable to write the grammar for each and every statement in the program for outdated languages like PL1.


Solution

  • You are not going to be able to reverse engineer a PL/1 compiler binary back into a grammar without enormous resources being spent.

    Get a PL/1 manual (IBM offers them) and use it to define a grammar.

    If you do manage to start getting a grammar, you're going to discover that PL/1 is extremely hard to parse: it has NO keywords. Every "keyword" in the language can also be used as variable. This is legal:

          IF BEGIN>END*PROCEDURE[PUT] THEN GOTO CALL;
    

    A conventional parser generator cannot handle this.

    Another issue you will face is PL/1's preprocessor. One encounters these directives in PL/1 source code rarely but pretty much always in a big software system (the kind typically undergoing modernization).

    (Been here, done a PL/1 full grammar and front end. Check my bio for more details).