ACTIVE_LIST = ACTOR | ACTIVE_LIST and ACTOR
ACTOR = NOUN | ARTICLE NOUN
ARTICLE = a | the
NOUN = tom | jerry | goofy | mickey | jimmy | dog | cat | mouse
By applying above rule I can generate
a tom
tom and a jerry
the tom and a jerry
the tom and a jerry and tom and dog
but not
Tom
the Tom and me
can I check the sentence is correct by only using python re module. I know how to match certain char by [abc] but don't know about word. Actually I am trying to solve this ACM problem. If someone assist me partially I can do the rest. This is my 1st question at this arena. Any suggestion or improvement highly appreciated.
Yes, you can write that as a regex pattern, because the grammar is regular. The regular expression will be pretty long, but it could be generated in a fairly straight-forward way; once you have the regex, you just compile it and apply it to each input.
The key is to turn regular rules into repetitions. For example,
STATEMENT = ACTION | STATEMENT , ACTION
can be turned into
ACTION (, ACTION)*
Of course, that's just a part of the problem, because you'd first have to have transformed ACTION
into a regular expression in order to create the regex for STATEMENT
.
The problem description glosses over an important issue, which is that the input does not just consist of lower-case alphabetic characters and commas. It also contains spaces, and the regular expression needs to insist on spaces at appropriate points. For example, the ,
above probably must (and certainly might) be followed by one (or more) spaces. It might be ok if it were preceded by a one or more spaces, too; the problem description isn't clear.
So the correction regular expression for NOUN
will actually turn out to be:
((a|the) +)?(tom|jerry|goofy|mickey|jimmy|dog|cat|mouse)
(I also found it interesting that the grammar as presented lets VERB
match "hatesssssssss". I have no idea whether that was intentional.)