Search code examples
python-2.7csvfor-loopnested-loops

Compare two CSV files in Python


I have two CSV files as follows:

CSV1:

**ID  Name  Address  Ph**
  1   Mr.C   dsf     142
  2   Ms.N   asd     251
  4   Mr.V   fgg     014
  12  Ms.S   trw     547

CSV2:

**ID  Name  Service  Day**
  1   Mr.C   AAA     Mon
  2   Ms.N   AAA     Mon
  2   Ms.N   BBB     Tue
  2   Ms.N   AAA     Sat

As you can see very quickly CSV1 file is unique in having only 1 instance of every ID whilst CSV2 has repeats.

I am trying to match two CSV files based on ID and then wherever they match adding to CSV2 file the Address and Ph fields from CSV1. This is then saved as a new output file leaving the two original CSV files intact.

I have written a code but here's what's happening:

  1. Either all the entries from CSV1 get added against the last row of CSV2
  2. Or all the entries from CSV2 get the same address details appended against them

Here's what I have done so far.

import csv
csv1=open('C:\csv1file.csv')
csv2=open('C:\csv2file.csv')
csv1reader=csv.reader(csv1)
csv2reader=csv.reader(csv2)

outputfile=open('C:\mapped.csv', 'wb')
csvwriter=csv.writer(outputfile)

counter=0
header1=csv1reader.next()
header2=csv2reader.next()

csvwriter.writerow(header2+header1[2:4])

for row1 in csv1reader:
    for row2 in csv2reader:
        if row1[0]==row2[0]:
            counter=counter+1
        csvwriter.writerow(row2+row1[2:4])

I am running this code in Python 2.7. As you might have guessed the two different results that I am getting are based on the indentation of the csvwriter statement in the above code. I feel I am quite close to the answer and understand the logic but somehow the loop doesn't loop very well.

Can any one of you please assist?

Thanks.



Solution

  • The problem arises because the inner loop only works once. the reason for that is, because csv2reader will be empty after you run the loop once

    a way to fix this would be to make a copy of the rows in the second file and use that copy in the loop

    csvwriter.writerow(header2+header1[2:4])
    
    csv2copy=[]
    for row2 in csv2reader: csv2copy.append(row2)
    
    for row1 in csv1reader:
        for row2 in csv2copy:
            print row1,row2,counter
            if row1[0]==row2[0]:
                counter=counter+1
                csvwriter.writerow(row2+row1[2:4])