I wrote a function to extract from text files with regex. The function returns each variable. I have many files and want to iterate over the files and append the results. Each variable is a list; I will combine these to create a DF. This also works.
I understand there is glob, but having trouble implementing. I've used it for directory / file lists in the past. I've been searching / reading a lot but clearly missing something obvious.
I wrote the function and have used glob to list file names before. I know of list.append, but unsure how to combine with glob (or similar).
How can I iterate over the files, call this function and append the results after each iteration?
TEXT:
A bunch of sentences
CUSTOMER: 78787
amount (500 dollars)
A bunch of sentences
CODE
def find(customer, amount):
with open(r"file.txt",'r') as myfile:
text = myfile.read()
customer = re.findall(r"^CUSTOMER:[\s](.*)\d+", text)
amount = re.findall(r'\((.*?)\)', text)
return customer, amount
The function works, but only for the one file currently read.
Just loop through the list of files generated with your function. Also, there is no point in passing in customer
or amount
. They are simply created at runtime of your find
function and persist after they are returned.
You can use pathlib.Path
's glob
method.
Here goes:
from pathlib import Path
def find(file_name):
with open(file_name,'r') as f:
text = f.read()
customer = re.findall(r"^CUSTOMER:[\s](.*)\d+", text)
amount = re.findall(r'\((.*?)\)', text)
return customer, amount
file_dir = Path("path_to_directory_containing_files") # CHANGE THIS
all_files = file_dir.glob("*.txt") # this should be whatever pattern that matches all the input files
results = [find(f) for f in all_files]