Search code examples
pythonexcelpandaspython-os

Importing multiple exce files including sheets into 1 excel file


I have 5 Excel files, each file contains 8 sheets, each sheet contains around 30 rows. So this means 1 file has in total 30 x 8 = 240 rows. Is there a speedy trick I can use to combine all these 5 files (including sheets) into 1 Excelfile with in total 240 x 5 rows = 1200 rows?

This is my current code:

import os
import pandas as pd

files = os.listdir('c:\data\KM\Desktop\my_folder') 
 
os.chdir(r'c:\data\KM\Desktop\my_folder')

df = pd.DataFrame()

for file in files:
    if file.endswith('.xlsx'):
        df = df.append(pd.read_excel(file))
df.head() 
df.to_excel('all_files.xlsx')  

Now with this code I have 2 problems:

  1. From all the files I just get the 1st sheet. So its merges 8 sheets in total instead of 40 (8x5) :(
  2. For every file it copies also the column headers, this needs to be done just for the 1st file. All the files and sheets have the same column names.

Appreciate your help all.


Solution

  • Use read_excel with sheet_name=None for all sheets, join together by concat for list of DataFrames and last use it again for one big DataFrame:

    import glob
    
    files = glob.glob(r'c:/data/KM/Desktop/my_folder/*.xlsx') 
    dfs = (pd.concat(pd.read_excel(fp, sheet_name=None)) for fp in files)
    dfbig = pd.concat(dfs, ignore_index=True)
    

    Edit: For remove last sheetname convert orderdict to lists of DataFrame and remiove last by indexing:

    files = glob.glob(r'D:/Dropbox/work-joy/so/files/*.xlsx')
    dfs = (pd.concat([v for k, v in pd.read_excel(fp, sheet_name=None).items()][:-1]) 
                     for fp in files)
    df = pd.concat(dfs, ignore_index=True)