I have two excel columns, name and company, and I'm trying to figure out what's the easiest way to make sure to obtain as an output a list of tuples (name, company) without any duplicates
The following code seems correct to me but for some reason it's not working, probably because of some dumb error I can't seem to find.
updated = openpyxl.load_workbook('abc.xlsx')
u_wb = updated.get_sheet_by_name('SP_Table')
u_names = u_wb['F'] #column F is where the names are
u_company = u_wb['C'] #column C is where the company's name are
l=[]
for x in range(len(u_names)-1):
i=x
i+=1
if u_company[x].value==None #in case a field is missing
continue
if i==len(u_names):
break
for z in l:
r=(u_names[x].value, u_names[x].value)
if r==z:
continue
else:
t=(u_names[x].value, u_company[x].value)
l.append(t)
print("Number of contacts:", len(l))
I don't get any error and the contact's count is actually reduced but only because of the u_company[x].value==None
clause.
Any help or resource is appreciated
The condition on which you're trying to ignore duplicates is not correct.
You are adding to the list couples of (u_names[x].value, u_company[x].value)
. That is OK and makes sense. The problem is that you're checking if (u_names[x].value, u_names[x].value)
is already in the list.
Apart from that, even if it was the same, you're simply doing nothing when you find a duplicate. Your else statement after the for
will always execute! This is because else statement after a for
loop happens when the loop came to conclusion without hitting a break
statement. So, what you want to do is:
for x in range(len(u_names)):
if u_company[x].value==None #in case a field is missing
continue
r = (u_names[x].value, u_company[x].value)
if r in l:
continue
else:
l.append(t)
print("Number of contacts:", len(l))