I need to check the excel files under a directory /Users/x/Documents/test/
by DataCheck
function from data_check.py
, so I can do data manipulation of many excel files, data_check.py
has code structure as follows:
import pandas as pd
def DataCheck(filePath):
df = pd.read_excel(filePath)
try:
df = df.dropna(subset=['building', 'floor', 'room'], how = 'all')
...
...
...
df.to_excel(writer, 'Sheet1', index = False)
if __name__ == '__main__':
status = True
while status:
rawPath = input(r"")
filePath = rawPath.strip('\"')
if filePath.strip() == "":
status = False
DataCheck(filePath)
In order to loop all the excel files' paths under a directory, I use:
import os
directory = '/Users/x/Documents/test/'
for filename in os.listdir(directory):
if filename.endswith(".xlsx") or filename.endswith(".xls"):
print(os.path.join(directory, filename))
else:
pass
Out:
/Users/x/Documents/test/test 3.xlsx
/Users/x/Documents/test/test 2.xlsx
/Users/x/Documents/test/test 4.xlsx
/Users/x/Documents/test/test.xlsx
But I don't know how to combine the code above together, to pass the excel files' paths to DataCheck(filePath)
.
Thanks for your kind help at advance.
Call the function with the names instead of printing them:
import os
directory = '/Users/x/Documents/test/'
for filename in os.listdir(directory):
if filename.endswith(".xlsx") or filename.endswith(".xls"):
fullname = os.path.join(directory, filename)
DataCheck(fullname)