Search code examples
pythonexcelpandasloopsbatch-processing

Reading a folder with multiple excel files which contain more than 15 sheets each into a loop to be processed for feature extraction


community. I hope you can offer some guidance as I am new to python programming

I am trying to read a folder that contains 15 excel files and each excel file has 30 worksheets. I am trying to read each excel worksheet separately because I need to extract the features from the 30 sheets. I need to read the excel sheets using a pandas data frame, however, when I try to do it appears empty. I am not sure how to proceed from here, I have tried to place just df into the df_frame but I get an error. This is the error This is what I get when I try to print the data frame, it is empty!

This is essentially what I get when I print everything, you can see it prints the 15 excel sheets and the 30 worksheets but the pandas data frame are empty

Get the current working directory print("The current working directory is:")

print(os.getcwd())

folder path

path = r'C:\Users' excel_files = glob.glob(os.path.join(path, "*.xlsx"))

loop over the list of excel files

for files in excel_files:

# read the csv file
df = pd.read_excel(files,sheet_name=None)
df_frame = pd.DataFrame(df,columns= ['Time','Sensor'])  


# print the location and filename
print (df_frame)
display(df_frame)
#print('Location:', files)
#print('File Name:', files.split("\\")[-1])
  
# print the content
#print('Content:')
#display(df)

print()`

I have tried to work around the message error, however, I don't know how to make the df_frame = pd.DataFrame(df,columns= ['Time','Sensor']) to be applied to all the excel worksheet in all the excel files.

P.s I don't need to join the excel worksheets into 1 single sheet at the moment. I just want to be able to read separately to be able to proceed with the feature extraction and then join the extracted features together.


Solution

  • The line df = pd.read_excel(files,sheet_name=None) creates a dictionary, with the dictionary keys being the string name of each sheet and the dictionary values being the data found in each sheet in a pandas data frame.

    You can loop through like so:

    df_dict = pd.read_excel(files,sheet_name=None)
    for sheet_name, df in df_dict.items():
        ## Rest of code here