Search code examples
pythonregexcompiler-construction

Python regex test the sentence is valid


ACTIVE_LIST = ACTOR | ACTIVE_LIST and ACTOR
ACTOR = NOUN | ARTICLE NOUN
ARTICLE = a | the
NOUN = tom | jerry | goofy | mickey | jimmy | dog | cat | mouse

By applying above rule I can generate

a tom 
tom and a jerry 
the tom and a jerry 
the tom and a jerry and tom and dog

but not

Tom 
the Tom and me

can I check the sentence is correct by only using python re module. I know how to match certain char by [abc] but don't know about word. Actually I am trying to solve this ACM problem. If someone assist me partially I can do the rest. This is my 1st question at this arena. Any suggestion or improvement highly appreciated.


Solution

  • Yes, you can write that as a regex pattern, because the grammar is regular. The regular expression will be pretty long, but it could be generated in a fairly straight-forward way; once you have the regex, you just compile it and apply it to each input.

    The key is to turn regular rules into repetitions. For example,

    STATEMENT = ACTION | STATEMENT , ACTION
    

    can be turned into

    ACTION (, ACTION)*
    

    Of course, that's just a part of the problem, because you'd first have to have transformed ACTION into a regular expression in order to create the regex for STATEMENT.

    The problem description glosses over an important issue, which is that the input does not just consist of lower-case alphabetic characters and commas. It also contains spaces, and the regular expression needs to insist on spaces at appropriate points. For example, the , above probably must (and certainly might) be followed by one (or more) spaces. It might be ok if it were preceded by a one or more spaces, too; the problem description isn't clear.

    So the correction regular expression for NOUN will actually turn out to be:

    ((a|the) +)?(tom|jerry|goofy|mickey|jimmy|dog|cat|mouse)
    

    (I also found it interesting that the grammar as presented lets VERB match "hatesssssssss". I have no idea whether that was intentional.)