Search code examples
javapattern-matchingbnf

How to do BNF matching with a BNF parser?


I am using https://github.com/sylvainhalle/Bullwinkle/, a BNF parser. But it can only recognize a BNF pattern for single-whole-sentence, how to find BNF matches in a paragraph? I am trying to recognize certain commands in human's speaking. For a BNF example:

<S>: <lightcmd>;
<lightcmd>: <do> <obj> | <do> <a> <obj>;
<do>: turn on | turn off;
<a>: a | an | the;
<obj>: light;

I can parse the right command from speaking "turn on the light", but I can't parse it from "please turn on the light".


Solution

  • When you try to parse "please turn on the light" against that grammar, the parser will attempt to parse the entire utterance ... because that is what parsers do. But what you want is to match part of the utterance.

    Now you could deal with this something like the following:

    <S>: <words> <lightcmd> <words>;
    <lightcmd>: <do> <obj> | <do> <a> <obj>;
    <do>: turn on | turn off;
    <a>: a | an | the;
    <obj>: light;
    <words>: <word> | <words> <word>
    <word>: // one or more letters 
    

    but you start getting into problems with ambiguity.

    I think that a better idea would be to try to parse sub-sequences of the word stream. So if the user says

    please turn on a light and make a cup of tea
    

    you attempt to parse starting with "please". Then when that fails you try starting with "turn". Then when you reach "light" you ignore the rest of the words.

    I don't know if Bullwinkle can be used like that. If not, look for alternative parser generators.


    However, it should also be noted that a grammar as simple as this can also be written as a regular expression where the words are treated as the base symbols. You would only need a simple regular expression engine to handle this. (If this grammar is typical, you only need grouping and alternation. It should only be a couple of hours work to implement this from scratch.)