1,boss,30
2,go,35
2,nan,45
3,fog,33
4,kd,55
4,gh,56
1,boss,30
3,fog,33
Means my output file should be free from duplicates. I should delete the record which is repeating based on the column 1.
source_rd = csv.writer(open("Non_duplicate_source.csv", "wb"),delimiter=d)
gok = set()
for rowdups in sort_src:
if rowdups[0] not in gok:
source_rd.writerow(rowdups)
gok.add( rowdups[0])
1,boss,30
2,go,35
3,fog,33
4,kd,55
What am I doing wrong?
You can just loop the file twice.
The first time through, count all the duplicates. Second time through fetch the ones of interest.
import csv
gok={}
with open(fn) as fin:
reader=csv.reader(fin)
for e in reader:
gok[e[0]]=gok.setdefault(e[0], 0)+1
with open(fn) as fin:
reader=csv.reader(fin)
for e in reader:
if gok[e[0]]==1:
print e
Prints:
['1', 'boss', '30']
['3', 'fog', '33']
The reason your method does not work is that once the second instance of the duplicate is seen, the first has already been written.