Search code examples
pythonloopsfor-looppython-itertoolsline-breaks

For-loop not inserting a line break when using zip_longest in Python 3


I am writing a simple text comparison tool. It takes two text files - a template and a target - and compares each character in each line using two for-loops. Any differences are highlighted with a Unicode full block symbol (\u2588). In the case that the target line is longer than the template, I am using itertools.zip_longest to fill the non-existant characters with a fill value.

from itertools import zip_longest

def compare(filename1, filename2):
    
    file1 = open(filename1, "r")
    file2 = open(filename2, "r")
    
    for line1, line2 in zip_longest(file1, file2):
    
        for char1, char2 in zip_longest(line1, line2, fillvalue=None):
            
            if char1 == char2:
                print(char2, end='')
            
            elif char1 == None:
                print('\u2588', end='')

compare('template.txt', 'target.txt')
Template file:        Target file:

First line            First lineXX
Second line           Second line
Third line            Third line

However, this appears to mess with Python's automatic line break placement. When a line ends with such a fill value, a line break is not generated, giving this result:

First line██Second line
Third line

Instead of:

First line██
Second line
Third line

The issue persisted after rewriting the script to use .append and .join (not shown to keep it short), though it allowed me to highlight the issue:

Result when both files are identical:

['F', 'i', 'r', 's', 't', ' ', 'l', 'i', 'n', 'e', '\n']
First line
['S', 'e', 'c', 'o', 'n', 'd', ' ', 'l', 'i', 'n', 'e', '\n']
Second line
['T', 'h', 'i', 'r', 'd', ' ', 'l', 'i', 'n', 'e']
Third line

Result when first line of target file has two more characters:

['F', 'i', 'r', 's', 't', ' ', 'l', 'i', 'n', 'e', '█', '█']
First line██['S', 'e', 'c', 'o', 'n', 'd', ' ', 'l', 'i', 'n', 'e', '\n']
Second line
['T', 'h', 'i', 'r', 'd', ' ', 'l', 'i', 'n', 'e']
Third line

As you can see, Python automatically adds a line break \n if the lines are of identical length, but as soon as zip_longest is involved, the last character in the list is the block, not a line break. Why does this happen?


Solution

  • Strip your lines before comparing characters and print new line between each line:

    from itertools import zip_longest
    
    def compare(filename1, filename2):
        
        file1 = open(filename1, "r")
        file2 = open(filename2, "r")
        
        for line1, line2 in zip_longest(file1, file2):
            line1, line2 = line1.strip(), line2.strip()  # <- HERE
    
            for char1, char2 in zip_longest(line1, line2, fillvalue=None):
                
                if char1 == char2:
                    print(char2, end='')
    
                elif char1 == None:
                    print('\u2588', end='')
            print()  # <- HERE
    
    compare('template.txt', 'target.txt')