Search code examples
pythonfile-ioglobpathlib

How to load multiple text files from a folder into a python list variable


I have a folder full of text documents, the text of which needs to be loaded into a single list variable.

Each index of the list, should be the full text of each document.

So far I have this code, but it is not working as well.

dir = os.path.join(current_working_directory, 'FolderName')
file_list = glob.glob(dir + '/*.txt')
corpus = [] #-->my list variable
for file_path in file_list:
    text_file = open(file_path, 'r')
    corpus.append(text_file.readlines()) 
    text_file.close()

Is there a better way to do this?

Edit: Replaced the csv reading function (read_csv) with text reading function (readlines()).


Solution

  • You just need to read() each file in and append it to your corpus list as follows:

    import glob
    import os
    
    file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))
    
    corpus = []
    
    for file_path in file_list:
        with open(file_path) as f_input:
            corpus.append(f_input.read())
    
    print(corpus)
    

    Each list entry would then be the entire contents of each text file. Note, using readlines() would give you a list of lines for each file rather than the raw text.

    With a list-comprehension

    file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))
    
    corpus = [open(file).read() for file in file_list]
    

    This approach though might end up with more resource usage as there is no with section to automatically close each file.