Search code examples
pythonpython-3.xopenpyxl

Openpyxl: Removing Duplicate cells from a column


I am trying to remove the duplicate entries from a column using openpyxl and writing the unique entries to a different workbook.

Input File:

Cust1
Cust1
Cust1
Cust2
Cust2
Cust3

Expected Output is:

Cust1
Cust2
Cust3
wb1 = openpyxl.load_workbook('OldFile.xlsx')
ws = wb1.active
wb2 = openpyxl.Workbook()
ws2 = wb2.active
k=1
new_row1 = []
for i in range(2, ws.max_row + 1 ):
  new_row1.append([])                   #list for storing the unique entries
  row_name = ws.cell(row=i,column=1).value  #taking the 1st cell's value
  new_row1[k].append(row_name)              #Appending the list
  ws2.append(new_row1)                      #writing to new workbook
  k+=1                                     
  for j in range(3, ws.max_row + 1 ):
    row_name2 = ws.cell(row=j, column=1).value #taking 2nd cell's value
    if row_name == row_name2:                  #comparing both the values
      i+=1                                      
      j+=1
wb2.save('NewFile.xlsx')

I am getting "IndexError: list index out of range" for line "new_row1[k].append(row_name)", also apart from the mentioned error is there something that has to be changed to get the required output.


Solution

  • As @CharlieClark said your code is overly complicated. Try instead:

    ws1 = wb1.active # keep naming convention consistent
    
    values = []
    for i in range(2,ws1.max_row+1):
      if ws1.cell(row=i,column=1).value in values:
        pass # if already in list do nothing
      else:
        values.append(ws1.cell(row=i,column=1).value)
    
    for value in values:
      ws2.append([value])