Search code examples
pythonpython-re

re.finditer find one pattern of multiple matches in a text file


I want to search for a pattern in text files and send the file where the match is found. Using re.finditer because I have two matching lists (patterns and names of the files to email) I want to send one file even if there are multiple matches. Is this possible?

file1:
thre pat1 patterns
pat1
pat1

file2:
Only one pat2
bla bla

import re
import itertools

cc_files = ["file1", "file2"] 
patlist = ["pat1", "pat2"]
prglist = ["prg1", "prg2"]

for (a, b) in itertools.zip_longest(patlist, prglist):
    for cc_file in cc_files:
        for i, line in enumerate(open(cc_file)):
            for match in re.finditer(a, line):
                print('Found on line %s: %s in file %s' % (i+1, match.group(), cc_file))
                # email: filename=(b + "_" + cc_file)) </code>`

output:
Found on line 1: pat1 in file file1
Found on line 2: pat1 in file file1
Found on line 3: pat1 in file file1
Found on line 1: pat2 in file file2
I want:
pat1 in file1
pat2 in file2


Solution

  • I don't think you need anything more complicated than this:

    cc_files = ["file1", "file2"] 
    patlist = ["pat1", "pat2"]
    prglist = ["prg1", "prg2"]
    
    for cc_file in cc_files:
        text = open(cc_file).read()
        for (a, b) in zip(patlist, prglist):
            if a in text:
                print("File", cc_file, "is program", b)
                break
    

    I'm not sure why you used zip_longest. If either of the two lists ends early, what can you possibly do with the rest?

    If you need line numbers, then it gets a little more complicated, but not much.