I am trying to create library that creates Abstract Syntax Tree from the formulas of TPTP (Thousand Problems for Theorem Provers) language http://tptp.org/. There is nice ANTRL4 grammar https://github.com/TobiasGleissner/TPTP-ANTLR4-Grammar for TPTP which I am using to generate parser. I have formula (TPTP expression) (![A:animal]:?[H:human]:H=owner_of(A))
and it has beautified (pretty print) parse tree that is generated by the standard ANTLR4 parser created from the referenced grammar:
thf_formula
thf_logic_formula
thf_unitary_formula (
thf_logic_formula
thf_binary_formula
thf_binary_pair
thf_unitary_formula
thf_quantified_formula
thf_quantification
thf_quantifier
fof_quantifier !
[
thf_variable_list
thf_variable
thf_typed_variable
variable A
:
thf_top_level_type
thf_unitary_type
thf_unitary_formula
thf_atom
thf_function
atom
untyped_atom
constant
functor
atomic_word animal
]:
thf_unitary_formula
thf_quantified_formula
thf_quantification
thf_quantifier
fof_quantifier ?
[
thf_variable_list
thf_variable
thf_typed_variable
variable H
:
thf_top_level_type
thf_unitary_type
thf_unitary_formula
thf_atom
thf_function
atom
untyped_atom
constant
functor
atomic_word human
]:
thf_unitary_formula
thf_atom
variable H
thf_pair_connective =
thf_unitary_formula
thf_atom
thf_function
functor
atomic_word owner_of
(
thf_arguments
thf_formula_list
thf_logic_formula
thf_unitary_formula
thf_atom
variable A
)
)
As usually - the raw parse tree is quite complex, but I understand every part of it except - and that is my question - why there is thf_binary_formula
and thf_binary_pair
in the parese tree? As I understand then TPTP binary formula is for binary connectives (conjunction, disjunction, implication), but my formula have none of them, my formula have just equality function =
and two quantifiers which both form nested unary formula.
So - what is the meaning of the TPTP binary formula and why it appears in my parse tree for this simple formula without binary connectives?
There's no real answer possible here, other than: because that is how the author of the grammar defined the rules :)
Let's look at the following very simple grammar:
grammar Expr;
parse
: expr EOF
;
expr
: add_expr
;
add_expr
: mult_expr ( ('+' | '-') mult_expr)*
;
mult_expr
: atom ( ('*' | '/') atom)*
;
atom
: '(' expr ')'
| NUMBER
;
NUMBER
: ( [0-9]* '.' )? [0-9]+
;
SPACES
: [ \t\r\n]+ -> skip
;
Because add_expr
is placed before mult_expr
, input like 1+2*3
will cause the *
operator the have a higher precedence than the +
operator. This results in the following parse tree:
However, because the grammar is written in this way, the parse tree will also have (empty) add_expr
and mult_expr
nodes for the simple number 1
when this is parsed:
That is why you see empty nodes in your parse tree you might not expect.