Search code examples
pythonfilepython-3.xindexingenumerate

How to determine if one string is longer than another, and then print out where the mismatch occurs?


I am writing a program which takes two user inputted files and determines if there are any mismatches between the two. If one file is longer than the other, I want to print that there was no character in the shorter file to compare at that index to the longer file. As of now my program iterates through both files and prints where mismatches occur, but stops once it reaches the end of the shorter file.

My code is as follows:

def character_difference(userfile1, userfile2):
    opened_file_1 = open(userfile1)
    opened_file_2 = open(userfile2)
    f1 = opened_file_1.read(-1)
    f2 = opened_file_2.read(-1)
    for index, (char1, char2) in enumerate(zip(f1, f2)):
        if char1 != char2:
            print("Mismatch at character", index, "%s != %s" % (char1, char2))
        else:
            continue
    opened_file_1.close()
    opened_file_2.close()
def main():
    userfile1 = input("Enter the name of the first file: ")
    userfile2 = input("Enter the name of the second file: ")
    character_difference(userfile1, userfile2)

main()

How would I go about adding the print statement that says there is no character where the empty space is? I'm not sure how to keep enumerating over the rest of the longer string.


Solution

  • zip will only iterate till the shortest of the iterables passed to it. Quoting the documentation,

    The iterator stops when the shortest input iterable is exhausted.

    Instead, use itertools.zip_longest to iterate till the longest of the iterables and use the default value for the shorter iterables. Quoting the documentation,

    If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted.

    For example,

    from itertools import zip_longest
    
    with open(userfile1) as f1, open(userfile2) as f2:
      f1 = f1.read(-1)
      f2 = f2.read(-1)
    
    for idx, (c1, c2) in enumerate(zip_longest(f1, f2, fillvalue="")):
      if c1 != c2:
        print("Mismatch at character", idx, "%s != %s" % (c1, c2))