Search code examples
parsinggrammarbnfpegpegjs

How to describe conditional statement (if-then-else) using PEG


i'm working on Qt's qmake project file parser (open source project). And i have a trouble with describing qmake's variant of conditional statement, called "scope" in documentation.

EBNF (simplified):

ScopeStatement -> Condition ScopeBody

Condition -> Identifier | TestFunctionCall | NotExpr | OrExpr | AndExpr
NotExpr -> "!" Condition
OrExpr   -> Condition "|" Condition
AndExpr -> Condition ":" Condition

ScopeBody -> COLON Statement | BR_OPEN Statement:*  BR_CLOSE

Statement -> AssignmentStatement
AssignmentStatement -> Identifier EQ String

// There are many others built-in boolean functions
TestFunctionCall -> ("defined" | ...)  ARG_LIST_OPEN (String COMMA:?):* ARG_LIST_CLOSE

Identifier -> Letter (Letter | Digit | UNDERSCP):+ String -> (Letter | Digit | UNDERSCP):+

EQ -> "="
COLON -> ":"
COMMA -> ","
ARG_LIST_OPEN -> "("
ARG_LIST_CLOSE -> ")"
BLOCK_OPEN -> "{"
BLOCK_CLOSE -> "}"
UNDERSCP -> "_"

First question: how to distinguish AND-operator colon from the condition terminal one? is it possible?

P.S. My grammar draft (without function call support) don't work even for simple case like

win32:xml: x = y

PEG.JS Code:

Start
  = ScopeStatement

// qmake scope statement
ScopeStatement
  = BooleanExpression ws* ((":" ws* SingleLineStatement) / ("{" ws* MultiLineStatement ))

SingleLineStatement
  = Identifier ws* "=" ws* Identifier lb* 

MultiLineStatement
  = (SingleLineStatement lb*)+

// qmake condition statement
BooleanExpression
  = BooleanOrExpression

BooleanOrExpression
  = left:BooleanAndExpression ws* "|" ws* right:BooleanOrExpression  { return {type: "OR", left:left, right:right} }
  / BooleanAndExpression

BooleanAndExpression
  = left:BooleanNotExpression ws* ":" ws* right:BooleanAndExpression  { return {type: "AND", left:left, right:right} }
  / BooleanNotExpression


BooleanNotExpression
  = "!" ws* operand:BooleanNotExpression { return {type: "NOT", operand: operand } }
  / BooleanComplexExpression


BooleanComplexExpression
  = Identifier
  / "(" logical_or:BooleanOrExpression ")" { return logical_or; }

Identifier
  = token:[a-zA-Z0-9_]+ { return token.join(""); }

ws 
  = [ \t]

lb
  = [\r\n]

Thanks!


Solution

  • You need to include a negative lookahead after the BooleanAndExpression for anything that is not a BooleanAndExpression, otherwise it will keep greedily consuming additional "and" expressions.

    Start
      = ScopeStatement
    
    // qmake scope statement
    ScopeStatement
      = bool:BooleanExpression ws* state:Statement  { return {bool:bool, state:state} }
    
    Statement
      = ":" ws* state:SingleLineStatement  { return state }
    
    SingleLineStatement
      = left:Identifier ws* "=" ws* right:Identifier lb*  { return {type: "ASSIGN", left:left, right:right} }
    
    MultiLineStatement
      = (SingleLineStatement lb*)+
    
    // qmake condition statement
    BooleanExpression
      = BooleanOrExpression
    
    BooleanOrExpression
      = left:BooleanAndExpression ws* "|" ws* right:BooleanOrExpression  { return {type: "OR", left:left, right:right} }
      / BooleanAndExpression
    
    BooleanAndExpression
      = left:BooleanNotExpression ws* !(":" ws* SingleLineStatement) ":" ws* right:BooleanAndExpression  { return {type: "AND", left:left, right:right} }
      / BooleanNotExpression
    
    
    BooleanNotExpression
      = "!" ws* operand:BooleanNotExpression { return {type: "NOT", operand: operand } }
      / BooleanComplexExpression
    
    
    BooleanComplexExpression
      = Identifier
      / "(" logical_or:BooleanOrExpression ")" { return logical_or; }
    
    Identifier
      = token:[a-zA-Z0-9_]+ { return token.join(""); }
    
    ws 
      = [ \t]
    
    lb
      = [\r\n]