I am trying to delete lines from a file using specific criteria The script i have seems to work but i have to add to many Or statements Is there a way i can make an variable that holds all the criterias i would like to remove from the files?
Example code
with open("AW.txt", "r+", encoding='utf-8') as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "PPL"not in line.split() or "PPLX"not in line.split() or "PPLC"not in line.split():
f.write(line)
f.truncate()
I was more thinking in this way but it fails when i add multiple criterias
output = []
with open('AW.txt', 'r+', encoding='utf-8') as f:
lines = f.readlines()
criteria = 'PPL'
output =[line for line in lines if criteria not in line]
f.writelines(output)
Regards
You can use regular expressions to your rescue which will reduce the number of statements and checks in the code. If you have a list of criteria which can be dynamic, let's call the list of criteria crit_list
, then the code would look like-
import re
with open("AW.txt", "r+", encoding='utf-8') as f:
new_f = f.readlines()
crit_list = ['PPL', 'PPLC', 'PPLX'] # Can use any number of criterions
obj = re.compile(r'%s' % ('|'.join(crit_list)))
out_lines = [line for line in new_f if not obj.search(line)]
f.truncate(0)
f.seek(0)
f.writelines(out_lines)
Use of regex
makes it look different from how OP had posted. Let me explain the two lines containing the regex-
obj = re.compile(r'%s' % ('|'.join(crit_list)))
This line creates a regex object with the regular expression 'PPL|PPLX|PPLC'
which means match at least one of these strings
in the given line which can be thought of as a substitute for using as many or
s in the code as there are criteria.
out_lines = [line for line in new_f if not obj.search(line)]
This statement means, search for the given criteria in the given line and if at least of them is found, preserve that line.
Hope that clears your doubts.