Search code examples

Python - Checking whether all the words in a line exist in an array

NOTE: I cannot use any imports other than sys and io for this question

For an assignment I have to take in two files as system arguments and both files contain lines of strings.

The get my assignment working, I want to read one line at a time in one file and check whether all of the words in that line are present in the other file.

Here are the files:


S -> NP VP
NP -> Det N
NP -> PN
Det -> "the" 
N -> "dog" 
N -> "rat" 
N -> "elephant"
PN -> "Alice"
PN -> "Bob"
VP -> V NP
V -> "admired" 
V -> "bit" 
V -> "chased"


the aardvark bit the dog
the dog bit the man
Bob killed Alice

So, I want to read each line in the u1a.utt and check that each word in that line is found in g1.ecfg.

I figured that the quotation marks in g1 might be a problem, so I put all words that are in quotes in an array without the quotes left in.

My current code always returns false, which produces "No valid parse" even when a string is supposed to print "Parsing!!!"

Can someone help me understand how to compare the words in each line with the g1 file?

Here is my code:

import sys
import io

# usage = python g#.ecfg u#L.utt

# Command Line Arguments - argv[0], argv[1], argv[2]
script = sys.argv[0]
grammarFile = open(sys.argv[1])
utteranceFile = open(sys.argv[2])

# Initialize rules from grammarFile

ruleArray = []
wordsInQuotes = []
uttWords = []

for line in grammarFile:
    rule = line.rstrip('\n')
    start = line.find('"') + 1
    end = line.find('"', start)
    wordsInQuotes.append(line[start:end])    #create a set of words from grammar file

for line in utteranceFile:
    x = line.split()
    print x
    if (all(x in grammarFile for x in line)):    #if all words found in grammarFile
        print "Parsing!!!"
        print "No valid parse"

I think it may have something to with my lists being hashable or not, or maybe an issue of scope, but I'm struggling to find an alternative that works for me.


  • Let's use sets to store items we will later check for membership, and use str.split to find the words in quotes.

    with open('grammarfile') as f:
        words = set()
        for line in f:
            line = [a for a in line.split() if '"' in a]
            for a in line:
                words.add(a.replace('"', ''))
    with open('utterancefile') as f:
        for line in f:
            if all(a in words for a in line.split())
                print("Good Parse")
                print("Word not found")