Search code examples
pythontext-parsing

How to extract lines based on a substring from two separate text files in python?


I have two text files. The first one is of the form:

K, 6
J, 5
L, 4

The second file has the form:

K_1, 6
K_2, 5
J_1, 4
J_2, 4
J_3, 5
L_1, 4

I need output of the form:

K_1, 6, 6, same
K_2, 5, 6, different
J_1, 4, 5, different
J_2, 4, 5, different
J_3, 5, 5, same
L_1, 4, 4, same

where each line starts with lines from second text file (first two values), then I have to pick the third value based on substring from the first text file (i.e for K_1, the substring is K and I need to pick value 6 from the first text file). If the values in each line are the same, then it should print "same" else print "different" on each line.

Finally I need a count of lines with "same" and count of lines with "different" in the output file.

I tried writing the following code, but it is not giving me the expected output:

m1 = open('TextFile_1.txt')
m2 = open('TextFile_2.txt')
result = open('Output.txt','w')

lookup_from = {}
l2=[]

for line1 in m1:
    z1 = line1.split(',')[0].strip()
    z2 = z1.split('_')[0].strip()
    z3 = line1.split(',')[1].strip()
    ZX = (z2, z1, z3)
    lookup_from[ZX] = 0

for line2 in m2:
    z11 = line2.split(',')[0].strip()
    z22 = z11.split('_')[0].strip()
    z33 = line2.split(',')[1].strip()
    if z22 in [x for x,_,_ in lookup_from]:
        z4 = (z22, z11, z33)
        z5 = z4 + tuple([x for _,_,x in lookup_from])
        l2.append(z5)

for i in l2:
    result.write(str(i)[1:-1]+'\n')
result.close()

Solution

  • You can avoid all the complicated lookups if you create a simple key, value dict from the first file and then generate the output on the fly whilst reading the second file:

    with open('TextFile_1.txt') as f1:
        lookup = dict([x.strip() for x in line.split(',')] for line in f1)
    
    with open('Output.txt', 'w') as out:
        with open('TextFile_2.txt') as f2:
            for line in f2:
                k, v = [x.strip() for x in line.split(',')]
                n = lookup[k.split('_')[0]]
                out.write(', '.join(
                    [k, v, n, 'same' if v == n else 'different']) + '\n')
    

    Output:

    K_1, 6, 6, same
    K_2, 5, 6, different
    J_1, 4, 5, different
    J_2, 4, 5, different
    J_3, 5, 5, same
    L_1, 4, 4, same