Search code examples
pythonexcelpandasopenpyxlxlwings

How can combine or merge all worksheets within an Excel file into one worksheet using python?


I'm trying to merge all worksheet tabs of each excel file within a provided file path into one worksheet. For Example if there's 5 Excel files with multiple amounts worksheet tabs, each Excel file now only contains one merged worksheet tab. I would like to append each of these merged worksheet tabs to a created output file.

My Code below is able to appended all worksheet tabs from the provided source files to an output file but I can't figure out how to merge the worksheet tabs from the source first first. Does anyone have an Idea how I can accomplish this? Please see my code below - Thanks!:

import glob
import os
import pandas as pd
import sys
import os.path
from openpyxl import load_workbook
from openpyxl import Workbook

#System Arguments
folder = sys.argv[1]
inputFile = sys.argv[2]
outputFile = sys.argv[3]

# specifying the path to xlsx files
path = r""+folder+""

#Create the new Excel Workbook with nameing convention provided by user
def create_file():
    wb = Workbook()
    wb.save(outputFile)

#Append the NEW or EXISTING Workbook with Input Files and Tabs to the already existing Excel File
def appened_file():
    outputPath = outputFile
    book = load_workbook(outputPath)
    writer = pd.ExcelWriter(outputPath, engine = 'openpyxl', mode="a", if_sheet_exists="new")
    writer.book = book
    for filename in glob.glob(path + "*" + inputFile + "*"):
            print(filename)
            excel_file = pd.ExcelFile(filename)
            (_, f_name) = os.path.split(filename)
            (f_short_name, _) = os.path.splitext(f_name)
            for sheet_name in excel_file.sheet_names:
                df_excel = pd.read_excel(filename, sheet_name=sheet_name,engine='openpyxl')
                df_newSheets = pd.DataFrame(df_excel)
                df_newSheets.to_excel(writer, sheet_name, index=False)                   
    writer.save()





Solution

  • If you want to do what is said in the title, you could do this solely with pandas, as pd.read_excel(path_input, sheet_name=None) can read all worksheets of a workbook in one pass:

    import pandas as pd
    
    path_input = r"test.xlsx"
    path_output = r"finished.xlsx"
    
    df_lst = pd.read_excel(path_input, sheet_name=None).values()
    df_res = pd.concat(
        [df_.transpose().reset_index().transpose() for df_ in df_lst])
    df_res.to_excel(path_output, index=False, header=False)
    

    It would also be possible to do this with xlwings or openpyxl, but usually pandas is fast.

    Example with data
    Assume an Excel workbook with three worksheets.

    Worksheet1:

    a   b   c
    foo cor wal
    bar gra plu
    baz ult xyz
    qux ply thu
    

    Worksheet2:

    u   v   w   x   y   z
    12  92  86  22      80
    23  29      74      21
    16  10  75  67  61  99
    

    Worksheet3:

    I   II  III IV
    1   5   9   1
    2   6   0   6
    3   7       3
    4   8   2   0
    

    Final output (after executing this snippet, i.e. after to_excel):

    a   b   c
    foo cor wal
    bar gra plu
    baz ult xyz
    qux ply thu
    u   v   w   x   y   z
    12  92  86  22      80
    23  29      74      21
    16  10  75  67  61  99
    I   II  III IV
    1   5   9   1
    2   6   0   6
    3   7       3
    4   8   2   0