Search code examples
pythonrecursionbinary-search-treebinary-searchpython-docx

How to retrieve file names from subfolders


I have the following folder structure:

root
│   file001.docx
│   file002.docx    
│
└───folder1
   │   file003.docx
   │   file004.docx
   │
   └───subfolder1
       │   file005.docx
       │   file006.docx
       |____subfolder2
            |
            |_file007.docx
   

I wish to create a program where when someone types their root directory and a keyword, the file will show up. for example: if I input "hello there!", file007.docx will show up (assume the text "hello there!" is contained in file007.docx ) and let the user know the typed words is in the word doc.

To approach this, I made a list of all the word documents inside the folders and sub folders by using this code:

def find_doc():
    variable= input('What is your directory?') #asking for root directory
    os.chdir(variable)
    files = []
    for dirpath, dirnames, filenames in os.walk(variable):
        for filename in [f for f in filenames if f.endswith(".docx")]:
            files.append(filename)  
    return files

Now, this is the second code for finding the contents in each word document:

all_files= find_doc() # just calling the first function I just made

while True: 
    keyword= input('Input your word or type in Terminate to exit: ')
    for i in range(len(all_files)): 
        text = docx2txt.process(all_files[i]) 
        if keyword.lower() in text.lower():  #to make it case insensitive
            print ((all_files[i]))    
    if keyword== ('Terminate') or keyword== ('terminate'):
        break

Theoretically, If I inputted the word "hello", within the input: input('Input your word or type in Terminate to exit: '), I should be able to retrieve file007.docx because all_files= find_doc() output

['file001.docx',
'file002.docx',
'file003.docx',
'file004.docx',
'file005.docx',
'file006.docx',
'file007.docx',]

Due to os.walk()'s recursive nature.

However, it threw me an error: FileNotFoundError: [Errno 2] No such file or directory:

I was wondering where I went wrong? Thanks!


Solution

  • I think you want to modify your function into something like this to store the filenames with their associated path.

    def find_doc():
        variable= input('What is your directory?') #asking for root directory
        os.chdir(variable)
        files = []
        for dirpath, dirnames, filenames in os.walk(variable):
            for filename in [f for f in filenames if f.endswith(".docx")]:
                files.append(os.path.join(dirpath, filename))
        return files
    

    You should also change your while loop so that your if statement gets checked prior to running the for loop.

    while True: 
        keyword= input('Input your word or type in Terminate to exit: ')
        if keyword.lower() == 'terminate':
            break
        else:   
            for i in range(len(all_files)): 
                text = docx2txt.process(all_files[i]) 
                if keyword.lower() in text.lower():  #to make it case insensitive
                    print ((all_files[i]))