Search code examples
pythoncsvpython-2.7moduledelete-row

How to Delete Rows CSV in python


I'm trying to compare two csv files (fileA and fileB), and remove any rows from fileA that are not found in fileB. I want to be able to do this without creating a third file. I thought I could do this using the csv writer module but now I'm second guessing myself.

Currently, I'm using the following code to record my comparison data from file B:

removal_list = set()
with open('fileB', 'rb') as file_b:
    reader1 = csv.reader(file_b)
    next(reader1)
    for row in reader1:
        removal_list.add((row[0], row[2]))

This is where I'm stuck and do not know how to delete the rows:

with open('fileA', 'ab') as file_a:
    with open('fileB', 'rb') as file_b:
        writer = csv.writer(file_a)
            reader2 = csv.reader(file_b)
            next(reader2)
            for row in reader2:
                if (row[0], row[2]) not in removal_list:
                # If row was not present in file B, Delete it from file A.
                #stuck here:  writer.<HowDoIRemoveRow>(row)

Solution

  • This solution uses fileinput with inplace=True, which writes to a temporary file and then automatically renames it at the end to your file name. You can't remove rows from a file but you can rewrite it with only the ones you want.

    if the keyword argument inplace=1 is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place.

    fileA

    h1,h2,h3
    a,b,c
    d,e,f
    g,h,i
    j,k,l
    

    fileB

    h1,h2,h3
    a,b,c
    1,2,3
    g,h,i
    4,5,6
    

    import fileinput, sys, csv
    
    with open('fileB', 'rb') as file_b:
        r = csv.reader(file_b)
        next(r) #skip header
        seen = {(row[0], row[2]) for row in r}
    
    f = fileinput.input('fileA', inplace=True) # sys.stdout is redirected to the file
    print next(f), # write header as first line
    
    w = csv.writer(sys.stdout) 
    for row in csv.reader(f):
       if (row[0], row[2]) in seen: # write it if it's in B
           w.writerow(row)
    

    fileA

    h1,h2,h3
    a,b,c    
    g,h,i