Search code examples
pythonarrayssortingstartswith

Removing lines in my file that contain a certain variable in python


My test.txt looks like

bear
goat
cat

what im trying to do is take the first line of it, which is bear and find and lines that contain it then delete them, the problem here is when I run my code all it does is delete all of the contents of my output file.

import linecache
must_delete = linecache.getline('Test.txt', 1)
with open('output.txt','r+') as f:
    data = ''.join([i for i in f if not i.lower().startswith(must_delete)])
    f.seek(0)                                                         
    f.write(data)                                                     
    f.truncate()  


Solution

  • What you want is in-place editing, meaning read and write at the same time, line by line. Python has the fileinput module which offers this ability.

    from __future__ import print_function
    import linecache
    import fileinput
    
    must_delete = linecache.getline('Test.txt', 1)
    
    for line in fileinput.input('output.txt', inplace=True):
        if line != must_delete:
            print(line, end='')
    

    Notes

    • The calls to fileinput.input() includes the parameter inplace=True which specifies in-place editing
    • Within the with block, because of the in-place editing, the print() function (by magic) will print to the file, not your console.
    • We need to call print() with end='' to avoid extra line-ending char(s). Alternatively, we can omit the from __future__ ... line and do use the print statement like this (note the ending comma):

      print line,
      

    Update

    If you want to detect the presence of the first line (e.g. 'bear') then there are two things more to do:

    1. In previous code, I did not strip new line from must_delete, so it might looks like bear\n. Now we need to strip off the new line in order to test anywhere within the line
    2. Instead of comparing the line with must_delete, we must do a partial string comparison: if must_delete in line:

    Putting it all together:

    from __future__ import print_function
    import linecache
    import fileinput
    
    must_delete = linecache.getline('Test.txt', 1)
    must_delete = must_delete.strip()  # Additional Task 1
    
    for line in fileinput.input('output.txt', inplace=True):
        if must_delete not in line:  # Additional Task 2
            print(line, end='')
    

    Update 2

    from __future__ import print_function
    import linecache
    import fileinput
    
    must_delete = linecache.getline('Test.txt', 1)
    must_delete = must_delete.strip()
    total_count = 0  # Total number of must_delete found in the file
    
    for line in fileinput.input('output.txt', inplace=True):
        # How many times must_delete appears in this line
        count = line.count(must_delete)
        if count > 0:
            print(line, end='')
        total_count += count  # Update the running total
    
    # total_count is now the times must_delete appears in the file
    # It is not the number of deleted lines because a line might contains
    # must_delete more than once