Search code examples
pythonfileline

Remove last blank lines from a failed file using file.write (line)


I've been trying numerous issues here in stack.overflow to remove the last blank lines from the 2.txt file (input):

2.txt file:

-11
B1
5
B1
-2
B1
7
B1
-11
B1
9
B1
-1
B1
-3
B1
19
B1
-22
B1
2
B1
1
B1
18
B1
-14
B1
0
B1
11
B1
-8
B1
-15

and the only one that worked using print(line) was this https://stackoverflow.com/a/6745865/10824251. But when I try to use f.write(line) rather than print(line) in my final 2.txt file (output) is as shown below:

2.txt file final:

-11B15B1-2B17B1-11B19B1-1B1-3B119B1-22B12B11B118B1-14B10B111B1-8B1-15
18
B1
-14
B1
0
B1
11
B1
-8
B1
-15

However, when I use the code using print line) instead of f.write (line), my bash terminal displays output with the last lines deleted (see print(line) result in terminal bash below) but with deformation equal to 2.txt file final, ie it works correctly. I have tried to understand what is happening but have not made any progress.

print(line) resut in terminal bash

-11B15B1-2B17B1-11B19B1-1B1-3B119B1-22B12B11B118B1-14B10B111B1-8B1-15
18
B1
-14
B1
0
B1
11
B1
-8
B1
-15

UPDATE:

My script eliminating the last lines of 2.txt file but deforming the first lines of in the terminal bash:

for line in open('2.txt'):
  line = line.rstrip()
  if line != '':
    print (line)

My script deforming the first lines of 2.txt fileand also does not delete the last lines as desired in file output 3.txt:

with open("2.txt",'r+') as f:
  for line in open('3.txt'):
    line = line.rstrip()
    if line != '':
        f.write(line)

Solution

  • Fixing the existing approach

    rstrip() removes the trailing newline in addition to other content, so when you write the result, it leaves the cursor on the end of the same line.

    One way to fix it that's clear about what needs to change (all code unmodified but for addition of the last line):

    with open("2.txt",'r+') as f:
      for line in open('3.txt'):
        line = line.rstrip()
        if line != '':
            f.write(line)
            f.write(os.linesep)  # one extra line
    

    Alternately, you could change f.write(line) to print(line, file=f).


    Optimizing to run quickly on huge files

    If you need to trim a small number of blank lines from the end of an arbitrarily-large file, it makes sense to skip to the end of that file and work backwards; that way, you don't care how large the whole file is, but only how much content needs to be removed.

    That is, something like:

    import os, sys
    block_size = 4096 # 4kb blocks; decent chance this is your page size & disk sector size.
    filename = sys.argv[1] # or replace this with a hardcoded name if you prefer
    
    with open(filename, 'r+b') as f:   # seeking backwards only supported on files opened binary
        while True:
            f.seek(0, 2)                            # start at the end of the file
            offset = f.tell()                       # figure out where that is
            f.seek(max(0, offset - block_size), 0)  # move up to block_size bytes back
            offset = f.tell()                       # figure out where we are
            trailing_content = f.read()             # read from here to the end
            new_content = trailing_content.rstrip() # remove all whitespace
            if new_content == trailing_content:     # nothing to remove?
                break                               # then we're done.
            if(new_content != ''):                  # and if post-strip there's content...
                f.seek(offset + len(new_content))   # jump to its end...
                f.write(os.linesep.encode('utf-8')) # ...write a newline...
                f.truncate()                        # and then delete the rest of the file.
                break
            else:
                f.seek(offset, 0)                   # go to where our block started
                f.truncate()                        # and delete *everything* after it
                # run through the loop again, to see if there's still more trailing whitespace.