Hi I want to use the startswith function to print out the lines in fileY.txt
which are NOT partially matched with the lines in fileX.txt
In the script below I use fileX.txt
and fileY.txt
as lists. I then search fileX.txt
for a partial match with fileY.txt
using the startswith function.
Next I attempt to print the lines which are NOT partially matched between fileX.txt
and fileY.txt
. However the script only prints the last line in fileY.txt
Any help will suggestions will be appreciated (I don't mind if I have to use a helper app like sed for example)
Source:
#load lines from file into lists
lines1 = [line1.rstrip('\n') for line1 in open('fileX.txt')]
lines2 = [line2.rstrip('\n') for line2 in open('fileY.txt')]
#set lines
set_of_lines1 = set(lines1)
set_of_lines2 = set(lines2)
#set common
common = set_of_lines1 & set_of_lines2
#return lines which partially match as variable e
[e for e in lines1 if e.startswith(tuple(lines2))]
#minus partially matched lines from fileY.txt
difference = set_of_lines2 - e
#print the non matching lines
for color in difference:
print 'The color prefix ' + color + ' does not exist in the list'
fileX.txt:
blue
green
red
fileY.txt:
blu
gre
re
whi
oran
What I want:
C:\Users\Foo\Bar\Python\Test\>C:\python27\python Test.py
The color prefix whi does not exist in the list
The color prefix oran does not exist in the list
Press any key to continue . . .
The first problem is with this line:
[e for e in lines1 if e.startswith(tuple(lines2))]
It constructs a list of partial matches, and then throws it away. All you retain is the value of e
which has leaked out of the list comprehension (and in Python 3 would give you an undefined value error). You need:
partial_match = [e for e in lines1 if e.startswith(tuple(lines2))]
which brings us to the second problem. If you print out partial_match
, you will see that it contains ['blue', 'green', 'red']
and I think you are expecting it to contain ['blu', 'gre', 're']
, because you are trying to do a set difference between it and set(['blu', 're', 'gre', 'whi', 'oran'])
.
Since your problems revolve around the list comprehension I suggest you unwind it into a loop where you can print out intermediate values so you can see what is going on and get the logic right. If you really want a one-liner you can always rewrite it later.
Like this:
matches = []
for prefix in lines2:
for colour in lines1:
if colour.startswith(prefix):
matches.append(prefix)
matches
will now contain ['blu', 'gre', 're']
. Now report on the prefixes that are not matches.
for nomatch in set(lines2) - set(matches):
print "The color prefix %r does not exist in the list" % nomatch
This will give you the output:
The color prefix 'whi' does not exist in the list
The color prefix 'oran' does not exist in the list