parsing logic artificial-intelligence antlr4 first-order-logic

Why there is binary formula (thf_binary_formula) in the parse tree of TPTP (Thousand Problems...) formula?

I am trying to create library that creates Abstract Syntax Tree from the formulas of TPTP (Thousand Problems for Theorem Provers) language http://tptp.org/. There is nice ANTRL4 grammar https://github.com/TobiasGleissner/TPTP-ANTLR4-Grammar for TPTP which I am using to generate parser. I have formula (TPTP expression) (![A:animal]:?[H:human]:H=owner_of(A)) and it has beautified (pretty print) parse tree that is generated by the standard ANTLR4 parser created from the referenced grammar:

thf_formula
  thf_logic_formula
    thf_unitary_formula (
      thf_logic_formula
        thf_binary_formula
          thf_binary_pair
            thf_unitary_formula
              thf_quantified_formula
                thf_quantification
                  thf_quantifier
                    fof_quantifier !
                  [
                  thf_variable_list
                    thf_variable
                      thf_typed_variable
                        variable A
                        :
                        thf_top_level_type
                          thf_unitary_type
                            thf_unitary_formula
                              thf_atom
                                thf_function
                                  atom
                                    untyped_atom
                                      constant
                                        functor
                                          atomic_word animal
                  ]:
                thf_unitary_formula
                  thf_quantified_formula
                    thf_quantification
                      thf_quantifier
                        fof_quantifier ?
                      [
                      thf_variable_list
                        thf_variable
                          thf_typed_variable
                            variable H
                            :
                            thf_top_level_type
                              thf_unitary_type
                                thf_unitary_formula
                                  thf_atom
                                    thf_function
                                      atom
                                        untyped_atom
                                          constant
                                            functor
                                              atomic_word human
                      ]:
                    thf_unitary_formula
                      thf_atom
                        variable H
            thf_pair_connective =
            thf_unitary_formula
              thf_atom
                thf_function
                  functor
                    atomic_word owner_of
                  (
                  thf_arguments
                    thf_formula_list
                      thf_logic_formula
                        thf_unitary_formula
                          thf_atom
                            variable A
                  )
      )

As usually - the raw parse tree is quite complex, but I understand every part of it except - and that is my question - why there is thf_binary_formula and thf_binary_pair in the parese tree? As I understand then TPTP binary formula is for binary connectives (conjunction, disjunction, implication), but my formula have none of them, my formula have just equality function = and two quantifiers which both form nested unary formula.

So - what is the meaning of the TPTP binary formula and why it appears in my parse tree for this simple formula without binary connectives?

Solution

There's no real answer possible here, other than: because that is how the author of the grammar defined the rules :)

Let's look at the following very simple grammar:

grammar Expr;

parse
 : expr EOF
 ;

expr
 : add_expr
 ;

add_expr
 : mult_expr ( ('+' | '-') mult_expr)*
 ;

mult_expr
 : atom ( ('*' | '/') atom)*
 ;

atom
 : '(' expr ')'
 | NUMBER
 ;

NUMBER
 : ( [0-9]* '.' )? [0-9]+
 ;

SPACES
 : [ \t\r\n]+ -> skip
 ;

Because add_expr is placed before mult_expr, input like 1+2*3 will cause the * operator the have a higher precedence than the + operator. This results in the following parse tree:

However, because the grammar is written in this way, the parse tree will also have (empty) add_expr and mult_expr nodes for the simple number 1 when this is parsed:

That is why you see empty nodes in your parse tree you might not expect.