NOTE: I cannot use any imports other than sys and io for this question
For an assignment I have to take in two files as system arguments and both files contain lines of strings.
The get my assignment working, I want to read one line at a time in one file and check whether all of the words in that line are present in the other file.
Here are the files:
g1.ecfg
S -> NP VP
NP -> Det N
NP -> PN
Det -> "the"
N -> "dog"
N -> "rat"
N -> "elephant"
PN -> "Alice"
PN -> "Bob"
VP -> V NP
V -> "admired"
V -> "bit"
V -> "chased"
u1a.utt
the aardvark bit the dog
the dog bit the man
Bob killed Alice
So, I want to read each line in the u1a.utt and check that each word in that line is found in g1.ecfg.
I figured that the quotation marks in g1 might be a problem, so I put all words that are in quotes in an array without the quotes left in.
My current code always returns false, which produces "No valid parse" even when a string is supposed to print "Parsing!!!"
Can someone help me understand how to compare the words in each line with the g1 file?
Here is my code:
import sys
import io
# usage = python CKYdet.py g#.ecfg u#L.utt
# Command Line Arguments - argv[0], argv[1], argv[2]
script = sys.argv[0]
grammarFile = open(sys.argv[1])
utteranceFile = open(sys.argv[2])
# Initialize rules from grammarFile
ruleArray = []
wordsInQuotes = []
uttWords = []
for line in grammarFile:
rule = line.rstrip('\n')
start = line.find('"') + 1
end = line.find('"', start)
ruleArray.append(rule)
wordsInQuotes.append(line[start:end]) #create a set of words from grammar file
for line in utteranceFile:
x = line.split()
print x
if (all(x in grammarFile for x in line)): #if all words found in grammarFile
print "Parsing!!!"
else:
print "No valid parse"
I think it may have something to with my lists being hashable or not, or maybe an issue of scope, but I'm struggling to find an alternative that works for me.
Let's use sets to store items we will later check for membership, and use str.split
to find the words in quotes.
with open('grammarfile') as f:
words = set()
for line in f:
line = [a for a in line.split() if '"' in a]
for a in line:
words.add(a.replace('"', ''))
with open('utterancefile') as f:
for line in f:
if all(a in words for a in line.split())
print("Good Parse")
else:
print("Word not found")