I have 3 Excel files with a column of data in cells A1 to A10 (the "Source Cells") in each workbook (on sheet 1 in each workbook). I would like to copy the data from the Source Cells into a new workbook, but the data must shift into a new column each time.
For example:
I'm struggling to figure the best way to adjust "j" in my code on each iteration. I'm also not sure what the cleanest way is to run each function for the different source files.
All suggestions on how to make this code cleaner will also be appreciated because I admit it's so messy at the moment!
Thanks in advance!
import openpyxl as xl
filename_1 = "C:\\workspace\\scripts\\file1.xlsx"
filename_2 = "C:\\workspace\\scripts\\file2.xlsx"
filename_3 = "C:\\workspace\\scripts\\file3.xlsx"
destination_filename = "C:\\workspace\\scripts\\new_file.xlsx"
num_rows = 10
num_columns = 1
def open_source_workbook(path):
'''Open the workbook and worksheet in the source Excel file'''
workbook = xl.load_workbook(path)
worksheet = workbook.worksheets[0]
return worksheet
def open_destination_workbook(path):
'''Open the destination workbook I want to copy the data to.'''
new_workbook = xl.load_workbook(path)
return new_workbook
def open_destination_worksheet(path):
'''Open the worksheet of the destination workbook I want to copy the data to.'''
new_worksheet = new_workbook.active
return new_worksheet
def copy_to_new_file(worksheet, new_worksheet):
for i in range (1, num_rows + 1):
for j in range (1, num_columns + 1):
c = worksheet.cell(row = i, column = j)
new_worksheet.cell(row = i, column = j).value = c.value
worksheet = open_source_workbook(filename_1)
new_workbook = open_destination_workbook(destination_filename)
new_worksheet = open_destination_worksheet(new_workbook)
copy_to_new_file(worksheet, new_worksheet)
new_workbook.save(str(destination_filename))
Question: Loop files, copy a specific column, with each new “paste” shifting to the adjacent column?
This approach first aggregates from all files the Column Cell values.
Then rearange it so, that it can be used by the openpyxl.append(...
method.
Therefore, no target Column knowledge are needed.
Reference:
class collections.OrderedDict([items])
Ordered dictionaries are just like regular dictionaries but have some extra capabilities relating to ordering operations.
openpyxl.utils.cell.coordinate_to_tuple(coordinate)
Convert an Excel style coordinate to (row, colum) tuple
iter_rows(min_row=None, max_row=None, min_col=None, max_col=None, values_only=False)
Produces cells from the worksheet, by row. Specify the iteration range using indices of rows and columns.
Return an iterator that applies function to every item of iterable, yielding the results.
Make an iterator that aggregates elements from each of the iterables.
Used imports
import openpyxl as opxl
from collections import OrderedDict
Define the files in a OrderedDict
to retain file <=> column order
file = OrderedDict.fromkeys(('file1', 'file2', 'file3'))
Define the Range as index values. Convert the Excel A1 notation into index values
min_col, max_col, min_row, max_row =
opxl.utils.cell.range_to_tuple('DUMMY!A1:A10')[1]
Loop the defined files,
load every Workbook and get a reference to the default Worksheet
get the cell values from the defined range:
min_col=1, max_col=1, min_row=1, max_row=10
for fname in file.keys():
wb = openpyxl.load_workbook(fname)
ws = wb.current()
file[fname] = ws.iter_rows(min_row=min_row,
max_row=max_row,
min_col=min_col,
max_col=max_col,
values_only=True)
Define a new Workbook and get a reference to the default Worksheet
wb2 = opxl.Workbook()
ws2 = wb2.current()
Zip the values, Row per Row from all files
Map the ziped list of tuples using a lambda
to flatten to a list of Row values.
Append the list of values to the new Worksheet
for row_value in map(lambda r:tuple(v for c in r for v in c),
zip(*(file[k] for k in file))
):
ws2.append(row_value)
Save the new Workbook
# wb2.save(...)