Search code examples
smalltalkpharopetitparser

Self-referencing PetitParser's PPCompositeParsers


I have a programming language grammar I would like to explode in several subclasses of PPCompositeParser (e.g., one class will handle instructions, another class will handle expressions, another class with handle program structure). I want to do this to avoid getting a big class with tens of instance variables.

My problem is that these sub-grammars have a cyclic dependency: the structure grammar references the 'statement' rule of the statement grammar, which references the 'expression' rule of the expression grammar, which references the 'subroutineName' of the structure grammar (closing the dependency cicle). I tried the simple approach to have, e.g., a #subroutineName method in the expression grammar which looks like:

MyExpressionGrammar>>subroutineName
  ^ N2TJStructureParser newStartingAt: #subroutineName

but that fails at initialization because of an infinite recursion (obviously).

To solve this problem, I created a PPDeferedParser:

PPParser subclass: #PPDeferedParser
    instanceVariableNames: 'creationBlock'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PetitParser-Tools'

PPDeferedParser>>parseOn: aStream
    ^ creationBlock value parseOn: aStream

which makes the previous #subroutineName looks like:

MyExpressionGrammar>>subroutineName
  ^ PPDederedParser creationBlock: [N2TJStructureParser newStartingAt: #subroutineName]

This seems to work but I wonder if there is any other solution.


Solution

  • Currently splitting a composite parser into multiple PPCompositeParser subclasses is not directly supported by PetitParser.

    Keep in mind that if you use the PetitParser browser you don't need to bother about instance-variables, they are automatically managed for you. Furthermore, you don't necessarily need an instance variable for every production. For example, terminals can be in methods that you call directly.

    You solution certainly works too, but it is not that nice because it requires you to pay careful attention on how you want to connect your grammars. Also in your implementation you should lazily cache the resulte, as otherwise your code will create new composite parsers while parsing. This is very expensive.

    All this aside, it would certainly be possible to improve PPCompositeParser to support dependencies between multiple subclasses, for example by declaring dependent other parsers that the constructor would prepare, initialize and eventually resolve.