Search code examples
pythonarraysparsinglinewords

Python - Checking whether all the words in a line exist in an array


NOTE: I cannot use any imports other than sys and io for this question

For an assignment I have to take in two files as system arguments and both files contain lines of strings.

The get my assignment working, I want to read one line at a time in one file and check whether all of the words in that line are present in the other file.

Here are the files:

g1.ecfg

S -> NP VP
NP -> Det N
NP -> PN
Det -> "the" 
N -> "dog" 
N -> "rat" 
N -> "elephant"
PN -> "Alice"
PN -> "Bob"
VP -> V NP
V -> "admired" 
V -> "bit" 
V -> "chased"

u1a.utt

the aardvark bit the dog
the dog bit the man
Bob killed Alice

So, I want to read each line in the u1a.utt and check that each word in that line is found in g1.ecfg.

I figured that the quotation marks in g1 might be a problem, so I put all words that are in quotes in an array without the quotes left in.

My current code always returns false, which produces "No valid parse" even when a string is supposed to print "Parsing!!!"

Can someone help me understand how to compare the words in each line with the g1 file?

Here is my code:

import sys
import io

# usage = python CKYdet.py g#.ecfg u#L.utt

# Command Line Arguments - argv[0], argv[1], argv[2]
script = sys.argv[0]
grammarFile = open(sys.argv[1])
utteranceFile = open(sys.argv[2])

# Initialize rules from grammarFile

ruleArray = []
wordsInQuotes = []
uttWords = []

for line in grammarFile:
    rule = line.rstrip('\n')
    start = line.find('"') + 1
    end = line.find('"', start)
    ruleArray.append(rule)
    wordsInQuotes.append(line[start:end])    #create a set of words from grammar file


for line in utteranceFile:
    x = line.split()
    print x
    if (all(x in grammarFile for x in line)):    #if all words found in grammarFile
        print "Parsing!!!"
    else:
        print "No valid parse"

I think it may have something to with my lists being hashable or not, or maybe an issue of scope, but I'm struggling to find an alternative that works for me.


Solution

  • Let's use sets to store items we will later check for membership, and use str.split to find the words in quotes.

    with open('grammarfile') as f:
        words = set()
        for line in f:
            line = [a for a in line.split() if '"' in a]
            for a in line:
                words.add(a.replace('"', ''))
    
    with open('utterancefile') as f:
        for line in f:
            if all(a in words for a in line.split())
                print("Good Parse")
            else:
                print("Word not found")