Search code examples
pythonpandasxlsx

read all sheets excels in python


I have many excel files in a folder. I want to load all sheets except the first sheet into a dataframe. Each excel has different number of sheets. For example I have E1.xlsx, E2.xlsx. E1 has 5 sheets, and E2 has 9. I need to load sheet 1,2,3,4 from E1, and 1, 2, 3, 4, 5, 6, 7, 8 from E2 (without sheet 0).


filenames = glob.glob( "/*.xlsx")
d = []

for filename in filenames:
    df = pd.read_excel(filename, index_col=None, sheet_name=1) # don't know how to set sheet number
    d.append(df)

data = pd.concat(d, axis=0, ignore_index=True)

I have my code here, but don't know how to set the sheet number. (the column in sheet is totally different from other sheet, I think it's not a good choice to read all sheets and delete)


Solution

  • Can you try the following:

    import glob
    import pandas as pd
    
    filenames = glob.glob( "/*.xlsx")
    d = []
    not_req_sheets = [0]
    
    for filename in filenames:
        xl = pd.ExcelFile(filename)
        for sheet_name in xl.sheet_names:
            if sheet_name not in not_req_sheets:
                df = xl.parse(sheet_name, index_col=None)
                d.append(df)
    
    data = pd.concat(d, axis=0, ignore_index=True)