Search code examples
pythonpython-3.xstringfilegenerator

How do I join multiple lines of a file where the last character is '\'


Below is the code I'm using to join multiple lines in a file that end with '/' character and remove lines that startwith '#' or anything in line after '#' , this works fine but one of the test cases do not work:

def get_lines(path: str) -> Iterator[str]:
    
    output=''
    filehandle = open(path, 'r')
    lines = filehandle.readline()

    for lines in filehandle:
    
        if lines.startswith('#'):
            continue
        else:
            if lines.rstrip().endswith('\\'):
                next_line = next(filehandle)
                lines = lines.rstrip()[:-1] + next_line
            output += lines
            output = lines.split("#",1)
            lines = output[0]
            yield lines

a = (get_lines('C:\\Users\Swayam\Documents\ok2.txt'))

print (next(a))
print (next(a))
print (next(a))
print (next(a))
print (next(a))
print (next(a))

Below is the input file:

# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>

The output I get is:

<line0>
<line1>
<line2>
<line3.1 line3.2 \
line3.3>
<line4.1 line4.2>

The output I want is:

<line0>
<line1>
<line2>
<line3.1 line3.2 line3.3>
<line4.1 line4.2>

Notice that the line 4 works fine but line 3 does not work, what can I do to get this output?


Solution

  • There's a few things going on here. The reason why line 4 works and line 3 does not is because you are only checking for a line ending with '\' once. Since line three has more than one slash, it is still being split up. This could be fixed by changing your '\' check from an 'if' to a 'while'

    def get_lines(path):
    
    output=''
    filehandle = open(path, 'r')
    lines = filehandle.readline()
    
    for lines in filehandle:
    
        if lines.startswith('#'):
            continue
        else:
            while lines.rstrip().endswith('\\'):
                next_line = next(filehandle)
                lines = lines.rstrip()[:-1] + next_line
            output += lines
            output = lines.split("#",1)
            lines = output[0]
            yield lines
    
    a = (get_lines('C:\\Users\Swayam\Documents\ok2.txt'))
    [print(x) for x in a]
    

    This code produces:

    <line0>
    
    <line1>
    <line2>
    
    <line3.1 line3.2 line3.3>
    
    <line4.1 line4.2>
    

    While this is a valid solution it might not always produce what you are looking for. Consider the case where a comment exists after a line:

    # this entire line is a comment - don't include it in the output
    <line0>
    # this entire line is a comment - don't include it in the output
    <line1># comment
    <line2>
    # this entire line is a comment - don't include it in the output
    <line3.1 \
    line3.2 \
    line3.3>
    <line4.1 \ #this will create a problem
    line4.2>
    

    The same approach above produces:

    <line0>
    
    <line1>
    <line2>
    
    <line3.1 line3.2 line3.3>
    
    <line4.1 \
    line4.2>
    

    If this is an case you might expect, I would consider restructuring your search to not only look for lines that start with "#" but also split lines that contain them as well.