Search code examples
pythonopenpyxlcopy-paste

How do I loop through each source file and copy a specific column into a new workbook with each new "paste" shifting to the adjacent column?


I have 3 Excel files with a column of data in cells A1 to A10 (the "Source Cells") in each workbook (on sheet 1 in each workbook). I would like to copy the data from the Source Cells into a new workbook, but the data must shift into a new column each time.
For example:

  1. the Source Cells in File 1 must be copied to cells A1 to A10 in the new workbook;
  2. the Source Cells in File 2 must be copied to cells B1 to B10 in the new workbook; and
  3. the Source Cells in File 3 must be copied to cells C1 to C10 in the new workbook.

I'm struggling to figure the best way to adjust "j" in my code on each iteration. I'm also not sure what the cleanest way is to run each function for the different source files.

All suggestions on how to make this code cleaner will also be appreciated because I admit it's so messy at the moment!

Thanks in advance!

import openpyxl as xl

filename_1 = "C:\\workspace\\scripts\\file1.xlsx"
filename_2 = "C:\\workspace\\scripts\\file2.xlsx"
filename_3 = "C:\\workspace\\scripts\\file3.xlsx"

destination_filename = "C:\\workspace\\scripts\\new_file.xlsx"

num_rows = 10
num_columns = 1


def open_source_workbook(path):
    '''Open the workbook and worksheet in the source Excel file'''
    workbook = xl.load_workbook(path)
    worksheet = workbook.worksheets[0]
    return worksheet

def open_destination_workbook(path):
    '''Open the destination workbook I want to copy the data to.'''
    new_workbook = xl.load_workbook(path)
    return new_workbook

def open_destination_worksheet(path):
    '''Open the worksheet of the destination workbook I want to copy the data to.'''
    new_worksheet = new_workbook.active
    return new_worksheet

def copy_to_new_file(worksheet, new_worksheet):
    for i in range (1, num_rows + 1):
        for j in range (1, num_columns + 1):
            c = worksheet.cell(row = i, column = j)
            new_worksheet.cell(row = i, column = j).value = c.value

worksheet = open_source_workbook(filename_1)
new_workbook = open_destination_workbook(destination_filename)
new_worksheet = open_destination_worksheet(new_workbook)
copy_to_new_file(worksheet, new_worksheet)
new_workbook.save(str(destination_filename))

Solution

  • Question: Loop files, copy a specific column, with each new “paste” shifting to the adjacent column?

    This approach first aggregates from all files the Column Cell values.
    Then rearange it so, that it can be used by the openpyxl.append(... method.
    Therefore, no target Column knowledge are needed.


    Reference:


    1. Used imports

      import openpyxl as opxl
      from collections import OrderedDict
      
    2. Define the files in a OrderedDict to retain file <=> column order

      file = OrderedDict.fromkeys(('file1', 'file2', 'file3'))
      
    3. Define the Range as index values. Convert the Excel A1 notation into index values

      min_col, max_col, min_row, max_row =  
          opxl.utils.cell.range_to_tuple('DUMMY!A1:A10')[1]
      
    4. Loop the defined files,
      load every Workbook and get a reference to the default Worksheet
      get the cell values from the defined range:
      min_col=1, max_col=1, min_row=1, max_row=10

      for fname in file.keys(): 
          wb = openpyxl.load_workbook(fname)
          ws = wb.current()
          file[fname] =  ws.iter_rows(min_row=min_row, 
                                      max_row=max_row,
                                      min_col=min_col,
                                      max_col=max_col, 
                                      values_only=True)
      
    5. Define a new Workbook and get a reference to the default Worksheet

      wb2 = opxl.Workbook()
      ws2 = wb2.current()
      
    6. Zip the values, Row per Row from all files
      Map the ziped list of tuples using a lambda to flatten to a list of Row values.
      Append the list of values to the new Worksheet

      for row_value in map(lambda r:tuple(v for c in r for v in c), 
                           zip(*(file[k] for k in file))
                          ):
          ws2.append(row_value)
      
    7. Save the new Workbook

      # wb2.save(...)