Search code examples
parsinglogicartificial-intelligenceantlr4first-order-logic

Why there is binary formula (thf_binary_formula) in the parse tree of TPTP (Thousand Problems...) formula?


I am trying to create library that creates Abstract Syntax Tree from the formulas of TPTP (Thousand Problems for Theorem Provers) language http://tptp.org/. There is nice ANTRL4 grammar https://github.com/TobiasGleissner/TPTP-ANTLR4-Grammar for TPTP which I am using to generate parser. I have formula (TPTP expression) (![A:animal]:?[H:human]:H=owner_of(A)) and it has beautified (pretty print) parse tree that is generated by the standard ANTLR4 parser created from the referenced grammar:

thf_formula
  thf_logic_formula
    thf_unitary_formula (
      thf_logic_formula
        thf_binary_formula
          thf_binary_pair
            thf_unitary_formula
              thf_quantified_formula
                thf_quantification
                  thf_quantifier
                    fof_quantifier !
                  [
                  thf_variable_list
                    thf_variable
                      thf_typed_variable
                        variable A
                        :
                        thf_top_level_type
                          thf_unitary_type
                            thf_unitary_formula
                              thf_atom
                                thf_function
                                  atom
                                    untyped_atom
                                      constant
                                        functor
                                          atomic_word animal
                  ]:
                thf_unitary_formula
                  thf_quantified_formula
                    thf_quantification
                      thf_quantifier
                        fof_quantifier ?
                      [
                      thf_variable_list
                        thf_variable
                          thf_typed_variable
                            variable H
                            :
                            thf_top_level_type
                              thf_unitary_type
                                thf_unitary_formula
                                  thf_atom
                                    thf_function
                                      atom
                                        untyped_atom
                                          constant
                                            functor
                                              atomic_word human
                      ]:
                    thf_unitary_formula
                      thf_atom
                        variable H
            thf_pair_connective =
            thf_unitary_formula
              thf_atom
                thf_function
                  functor
                    atomic_word owner_of
                  (
                  thf_arguments
                    thf_formula_list
                      thf_logic_formula
                        thf_unitary_formula
                          thf_atom
                            variable A
                  )
      )

As usually - the raw parse tree is quite complex, but I understand every part of it except - and that is my question - why there is thf_binary_formula and thf_binary_pair in the parese tree? As I understand then TPTP binary formula is for binary connectives (conjunction, disjunction, implication), but my formula have none of them, my formula have just equality function = and two quantifiers which both form nested unary formula.

So - what is the meaning of the TPTP binary formula and why it appears in my parse tree for this simple formula without binary connectives?


Solution

  • There's no real answer possible here, other than: because that is how the author of the grammar defined the rules :)

    Let's look at the following very simple grammar:

    grammar Expr;
    
    parse
     : expr EOF
     ;
    
    expr
     : add_expr
     ;
    
    add_expr
     : mult_expr ( ('+' | '-') mult_expr)*
     ;
    
    mult_expr
     : atom ( ('*' | '/') atom)*
     ;
    
    atom
     : '(' expr ')'
     | NUMBER
     ;
    
    NUMBER
     : ( [0-9]* '.' )? [0-9]+
     ;
    
    SPACES
     : [ \t\r\n]+ -> skip
     ;
    

    Because add_expr is placed before mult_expr, input like 1+2*3 will cause the * operator the have a higher precedence than the + operator. This results in the following parse tree:

    enter image description here

    However, because the grammar is written in this way, the parse tree will also have (empty) add_expr and mult_expr nodes for the simple number 1 when this is parsed:

    enter image description here

    That is why you see empty nodes in your parse tree you might not expect.