Search code examples
pythonexcelpandasxlsxopenpyxl

How do I use Pandas for reading multiple xlsx files and outputting into one in individual file in multiple sheets?


The .xlsx files are all found in one directory. There is only .xlsx file in this directory. I need to take each of the individual .xlsx files and insert it into a single sheet.

The example looks like this:

  1. Directory has 4 .xlsx files
  2. Read all 4 .xlsx files
  3. Put all 4 .xlsx files into one single file
  4. Each file should represent one sheet.

The final result should be one Excel file with 4 sheets.


Solution

  • The process for doing this is:

    0. Setup

    Install required packages:

    pip install pandas
    pip install xlsxwriter
    

    Then import pandas into the Python file you're working in:

    import pandas as pd
    

    1. Read in the .xlsx files

    a. Each by name:

    df1 = pd.read_excel('./excelfile1.xlsx')
    

    etc

    b. Read all in current directory in:

    import os, re
    dfs = []
    for fname in os.listdir():
        if re.search(r'\.xlsx$', fname):
            dfs.append(pd.read_excel(fname))
    

    2. Create a new file and add existing files as sheets

    writer = pd.ExcelWriter('./newfilename.xlsx', engine='xlsxwriter')
    sheet_names = ['sheet1', ...]
    for df, sheet_name in zip(dfs, sheet_names):
        df.to_excel(writer, sheet_name=sheet_name)
    writer.save()
    

    This will create a new Excel file in the current directory called newfilename.xlsx with each of your existing Excel files as sheets.