My first post! I am currently trying to write a script that iterates through a directory full of HTML files and parse using re.findall. So far it's printing out the matching files correctly, though it also looks like the else statement is being printed along with it (I assumed that it wouldn't unless the if statement fell through?):
import re
import os
import codecs
dirpath = #path to local directory
for file_a in os.listdir(dirpath):
filepath = os.path.join(dirpath, file_a)
f = codecs.open(filepath, 'r', 'utf8')
lines = f.readlines()
for line in lines:
if re.findall('Pattern X', line):
print('Pattern X detected!', file_a)
else:
print('Pattern X not detected!', file_a)
I get an output similar to this:
Pattern X detected! test.html
Pattern X not detected! test.html
Thanks in advance!
If you just want to know if that string is present in the file, then you don't need findall
.
import re
import os
import codecs
dirpath = #path to local directory
for file_a in os.listdir(dirpath):
filepath = os.path.join(dirpath, file_a)
f = codecs.open(filepath, 'r', 'utf8')
if re.search('Pattern X', f.read()):
print('Pattern X detected!', file_a)
else:
print('Pattern X not detected!', file_a)