Search code examples
prolognlpgrammarbnfdcg

Is there a BNF with arguments for non-terminal symbols?


In working with Prolog DCG to parse input it is nice to have an accompaning BNF of the grammar.

For example:

BNF

<Sentence> ::= <Noun_phrase> <Verb_phrase>
<Noun_phrase> ::= <Determiner> <Noun>
<Verb_phrase> ::= <Verb> <Phrase>
<Determiner> ::= a
<Determiner> ::= the
<Noun> ::= cat
<Noun> ::= mouse
<Verb> ::= scares
<Verb> ::= hates

as Prolog DCG

sentence --> noun_phrase, verb_phrase.
verb_phrase --> verb, noun_phrase.
noun_phrase --> determiner, noun.
determiner --> [a].
determiner --> [the].
noun --> [cat].
noun --> [mouse].
verb --> [scares].
verb --> [hates].

However Prolog DCG can also have arguments as
in this example Number for singular or plural

sentence(Number) --> noun_phrase(Number), verb_phrase(Number).
verb_phrase(Number) --> verb(Number), noun_phrase(Number).
noun_phrase(Number) --> determiner(Number), noun(Number).
determiner(singular) --> [a].
determiner(singular) --> [the].
determiner(plural) --> [the].
noun(singular) --> [cat].
noun(plural) --> [cats].
noun(singular) --> [mouse].
noun(plural) --> [mice].
verb(singular) --> [scares].
verb(plural) --> [scare].
verb(singular) --> [hates].
verb(plural) --> [hate].

Is there a standard or accepted extension to BNF that includes arguments for non-terminals?

If so I need a link to it.

I suspect that ATN (Augmented Transition Networks) are in the ball park and may be the only standard answer, but I am hoping for something that is linear text as opposed to some form vertex/edge graph.


Solution

  • I think the concept of feature structures is what you're looking for; the sharing of arguments you show in your example is a special case of the more general feature structure unification approach.

    I'm not aware of feature structure extensions for BNF specifically, but there are reasonably accepted notations for adding them to other grammar formalisms. The documentation for NLTK (a Python natural language processing library) has an example here of the notation they use. Here are some of their rules that correspond roughly to the first few productions from your example:

    S -> NP[CASE=nom, AGR=?a] VP[AGR=?a]
    VP[AGR=?a] -> TV[OBJCASE=?c, AGR=?a] NP[CASE=?c]
    NP[CASE=?c, AGR=?a] -> Det[CASE=?c, AGR=?a] N[CASE=?c, AGR=?a]
    

    The ?x is their notation for logic variables. That entire chapter of the NLTK manual contains a general desciption of feature structures and includes some literature references.