Search code examples
pythonexcelpandasdata-extractiondata-ingestion

Extracting multiple excel files as Pandas data frame


I'm trying to create a data ingestion routine to load data from multiple excel files with multiple tabs and columns in the pandas data frame. The structuring of the tabs in each of the excel files is the same. Any help would be appreciated!!

folder = "specified_path"
files = os.listdir(folder)
sheet_contents = {}

for file in files:
    data = pd.ExcelFile(folder+file)
    file_data = {}

    for sheet in data.sheet_names:
        file_data[sheet] = data.parse(sheet)

    sheet_contents[file[:-5]] = file_data

Solution

  • One of the ways to create a dataframe for each excelfile (stored in a specific folder and that holds multiple sheets) is by using pandas.read_excel and pandas.concat combined. By passing the parameter sheet_name=None to pandas.read_excel, we can read in all the sheets in the excelfile at one time.

    Try this :

    import os
    import pandas as pd
    
    folder = 'specified_path'
    
    excel_files = [file for file in os.listdir(folder)]
    
    list_of_dfs = []
    for file in excel_files :
        df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None), ignore_index=True)
        df['excelfile_name'] = file.split('.')[0]
        list_of_dfs.append(df)
    

    To access to one of the dataframes created, you can use its index (e.g, list_of_dfs[0]) :

    print(type(list_of_dfs[0]))
    <class 'pandas.core.frame.DataFrame'>