Search code examples
pythondataframedirectoryiterationmultiple-file-upload

How to run my Python code for every Excel file contained in a folder?


I have a folder named with a certain acronym, and inside this folder you can find a certain number of Excel files. The folder's name indicates the name of the apartment (for ex. UDC06_45) and, inside this folder, all of the Excel files' name are composed by: the name of the apartment, followed by the name of the appliance that is located in that apartment (for ex. UDC06_45_Oven).

These Excel files are very simple DataFrames, they contain energy consumption measurements: one column named "timestamps" and one column named "Energy" (all of these measurements have a 15 min frequency). All of the Excel files inside the folder are made with the same identical structure.

My Python code takes as input only one of these Excel files at a time and makes few operations on them (resampling, time interpolation, etc.) starting with the command "pd.read_excel()", and creates an output Excel file with "df.to_excel()" after giving it a name.

What I want to do is to apply my code automatically to all of the files in that folder. The code should take as input only the name of the folder ("UDC06_45") and create as many output files as needed. So if the folder contains only two appliances:

  • "UDC06_45_Oven"
  • "UDC06_45_Fridge"

the code will elaborate them both, one after the other, and I should obtain two dinstinct Excel files as output. Their name is just composed by the input file's name followed by "_output":

  • "UDC06_45_Oven_output"
  • "UDC06_45_Fridge_output".

In general, this must be done for every Excel file contained in that folder. If the folder contains 5 appliances, meaning 5 input Excel files, I should obtain 5 output Excel files... and so on.

How can I do it?


Solution

  • In the following code only assing your path, in my case I have used a test folder path path=r'D:\test' this code will create a new folder automatically in the same path.

    import pandas as pd
    import os
    from glob import glob
    
    path=r'D:\test'  # add whatever your path is in place of 'D:\test'
    input_folder='UDC06_45' # name of input folder
    output_folder=input_folder+'_out' 
    
    new_path=path+'/'+output_folder 
    
    
    if not os.path.exists(new_path):
        os.makedirs(new_path)
        
    files=glob(path+'/'+input_folder+'/'+'*.xlsx')
    
    for file in files:
        name=file.split(path+'/'+input_folder+'\\')[-1].rsplit('.')[0]
        df=pd.read_excel(file)
        #do all your operations here
        df.to_excel(new_path+'/'+name+'_output.xlsx')