Search code examples
pythonpandasdirectory-structurelistdir

How do you process all files in a folder?


I want to run my code on all files in a directory. The code works fine on a single file, but my attempts to iterate on multiple files tells me

FileNotFoundError: [Errno 2] No such file or directory: 'file.xlsx'

directory = r"C:/Users/name/Desktop/folder/2018"
arrivals_aggregated = pd.DataFrame()

print(os.listdir(directory))
for filename in os.listdir(smt_directory):

    print('current file is ' + filename)
    x = pd.ExcelFile(filename)
    symbols = x_symbols(x)
    arv = x.parse(sheet_name='Arrivals', skiprows=5, usecols=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])
    arrivals = x_arrivals(arv, x)

arrivals_aggregated.append(arrivals)

I expect it to iterate across all the files in the directory, processing and aggregating the results to a big dataframe arrivals_aggregated. Instead it is stopping at x = pd.ExcelFile(filename), saying that file not found, even though it is there and even prints when I include

print('current file is ' + filename)

It is failing on the very first file in the folder without ever processing the code.


Solution

  • Whether this works depends on where you run the script. If filename is not present in the directory where you ran your script, then you will get a FileNotFoundError.

    I would instead do:

    x = pd.ExcelFile(os.path.sep.join([directory, filename]))
    

    which will ensure you're passing the true file location to pd.ExcelFile.