Search code examples
pythonfunctionappendextractglob

Extracting from multiple text flies, appending results


I wrote a function to extract from text files with regex. The function returns each variable. I have many files and want to iterate over the files and append the results. Each variable is a list; I will combine these to create a DF. This also works.

I understand there is glob, but having trouble implementing. I've used it for directory / file lists in the past. I've been searching / reading a lot but clearly missing something obvious.

I wrote the function and have used glob to list file names before. I know of list.append, but unsure how to combine with glob (or similar).

How can I iterate over the files, call this function and append the results after each iteration?

TEXT:

A bunch of sentences
CUSTOMER: 78787
amount (500 dollars)
A bunch of sentences

CODE

def find(customer, amount):    
    with open(r"file.txt",'r') as myfile:
        text = myfile.read() 

    customer = re.findall(r"^CUSTOMER:[\s](.*)\d+", text) 
    amount = re.findall(r'\((.*?)\)', text)

    return customer, amount

The function works, but only for the one file currently read.


Solution

  • Just loop through the list of files generated with your function. Also, there is no point in passing in customer or amount. They are simply created at runtime of your find function and persist after they are returned.

    You can use pathlib.Path's glob method. Here goes:

    from pathlib import Path
    
    def find(file_name):    
        with open(file_name,'r') as f:
            text = f.read() 
    
        customer = re.findall(r"^CUSTOMER:[\s](.*)\d+", text) 
        amount = re.findall(r'\((.*?)\)', text)
    
        return customer, amount
    
    file_dir = Path("path_to_directory_containing_files") # CHANGE THIS
    all_files = file_dir.glob("*.txt") # this should be whatever pattern that matches all the input files
    results = [find(f) for f in all_files]