in my first Python program I try to find duplicate values in my list which is loaded from CSV file (over 98+ k (98 000 lines) of rows in CSV file each with 5 columns) and save into list like Object (I use only 2 columns and in CNT column I save number of duplicates values):
class Duplication:
def __init__(self, pn, comp, cnt):
self.pn = pn
self.comp = comp
self.cnt = cnt
def __str__(self):
return f'{self.pn};{self.comp};{self.cnt}\n'
def __repr__(self):
return str(self)
def __hash__(self):
return hash(('pn', self.pn,
'competitor', self.comp))
def __eq__(self, other):
return self.pn == other.pn and self.comp == other.comp
After that I select only files which I had more times in list and try it to save duplicate object into new CSV file:
results = [d for d in duplicates if d.cnt > 1]
results = set(results)
with open(f'fileName.csv', 'a') as f:
f.writelines('=== Info Duplications to Delete ===\n')
for line in results:
f.writelines(print(line))
f.close()
print(results)
I got this error, but in results are over 7+ k values, which I want to save into CSV file when I have a smaller list under 100 values, data will be saved, but with this file with large data row.
I had this problem and I check data in the file and also in the debugger, and there is no None value or something which looks like a problem or invalid data
UPDATE
After change to:
with open(f'file.csv', 'a') as f:
f.writelines('===Info ===\n')
f.writelines(results)
#for line in results:
# f.writelines(print(line))
f.close()
print(results)
Run this script take over 20 minutes
Ok, the solution was simple, I only change:
f.writelines(print(line))
to:
f.writelines(str(line))
Now everything works fine