Search code examples
python-3.xpandaswindowsoperating-systemlistdir

Loop over excel files' paths under a directory and pass them to data manipulation function in Python


I need to check the excel files under a directory /Users/x/Documents/test/ by DataCheck function from data_check.py, so I can do data manipulation of many excel files, data_check.py has code structure as follows:

import pandas as pd

def DataCheck(filePath):
    df = pd.read_excel(filePath)
    try:
        df = df.dropna(subset=['building', 'floor', 'room'], how = 'all')
        ...
        ...
        ...
        df.to_excel(writer, 'Sheet1', index = False)
    
if __name__ == '__main__':
    status = True
    while status:
        rawPath = input(r"")
        filePath = rawPath.strip('\"')
        if filePath.strip() == "":
            status = False
        DataCheck(filePath)

In order to loop all the excel files' paths under a directory, I use:

import os

directory = '/Users/x/Documents/test/'
for filename in os.listdir(directory):
    if filename.endswith(".xlsx") or filename.endswith(".xls"):
        print(os.path.join(directory, filename))
    else:
        pass

Out:

/Users/x/Documents/test/test 3.xlsx
/Users/x/Documents/test/test 2.xlsx
/Users/x/Documents/test/test 4.xlsx
/Users/x/Documents/test/test.xlsx

But I don't know how to combine the code above together, to pass the excel files' paths to DataCheck(filePath).

Thanks for your kind help at advance.


Solution

  • Call the function with the names instead of printing them:

    import os
    
    directory = '/Users/x/Documents/test/'
    for filename in os.listdir(directory):
        if filename.endswith(".xlsx") or filename.endswith(".xls"):
            fullname = os.path.join(directory, filename)
            DataCheck(fullname)