Search code examples
pythonmergeopenpyxl

Openpyxl: Identify same values in first column and merge other column(s)


My excel sheet has ID-numbers in column A but at times, they are the same. In this case, I want the program to take the text in column I and concatenate it. Column I can have 1 to n rows with the same ID

Example:

From here

Column A Column I
197 this is example no 1
198 this is
198 example no 2
199 this is example no 3

to here

Column A Column I
197 this is example no 1
198 this is example no 2
199 this is example no 3

It's not important whether column A remains two rows or merges into one, what's central is that column I merges.

I have tried iterating over each row checking whether the row before and after are the same. But what I struggle with is to put the newly joined list in another column. Right now, the excel list I get doesn't look different to the source.

mylist = []
for cell in sheet['A']:
    mylist.append(cell.value)

textlist = []
for cell in sheet['I']:
    textlist.append(str(cell.value))

mergecount=0
startcell=1
sheet.insert_cols(10)


for row in range(1, len(mylist)):
    print(row, mylist[row-1], mylist[row])
    if mylist[row-1] == mylist[row]:
        mergecount += 1
    else:
        print(row, mylist[row-1], mylist[row], startcell, mergecount)
        if mergecount > 0:
            for text in textlist:
                newcells = ' '.join(text)
                cell.value = newcells
        mergecount = 0
        startcell = row+1
if mergecount > 0:
    for text in textlist:
        newcells = ' '.join(text)
        cell.value = newcells
workbook.save(file)

Solution

  • I have rewritten a little bit the code trying to imitate your style and name of variables to make it easier to understand.

    For the first part we can use the same capture of the data on lists that you have done:

    import openpyxl
    from openpyxl import load_workbook
    
    file = "Book1.xlsx"
    workbook = load_workbook(file)
    sheet = workbook.active
    
    mylist = []
    for cell in sheet['A']:
        mylist.append(cell.value)
    
    textlist = []
    for cell in sheet['I']:
        textlist.append(str(cell.value))
    

    After, we can create 2 new lists to save the new IDs and new text, I would iterate from 0 as a good practice instead of iterating from "1":

    mylist_new = []
    textlist_new = []
    for row in range(0, len(mylist)):
        if row == 0:
            newtext = textlist[row]        
        else:
            if mylist[row-1] == mylist[row]:
                # The operator '+' is going to work to concatenate strings:
                newtext = newtext + ' ' + textlist[row]
            else:
                mylist_new.append(mylist[row-1])
                textlist_new.append(newtext)
                newtext = textlist[row]
    
    # I need these last 2 lines to save the last values of both: IDs and text:
    mylist_new.append(mylist[-1])
    textlist_new.append(newtext)
    

    Finally, you can save the data from the new lists on 2 different columns (in this example I will use the same worksheet on columns J (10th) and K (11th):

    for row in range(0, len(mylist_new)):
        sheet.cell(row=row+1, column=10).value = mylist_new[row]
    
    for row in range(0, len(textlist_new)):
        sheet.cell(row=row+1, column=11).value = textlist_new[row]
    
    workbook.save(file)
    

    I hope it can help you.