Search code examples
pythoncriteriakeyword

Python: Delete lines from except certain criteria


I am trying to delete lines from a file using specific criteria The script i have seems to work but i have to add to many Or statements Is there a way i can make an variable that holds all the criterias i would like to remove from the files?

Example code

with open("AW.txt", "r+", encoding='utf-8') as f:
    new_f = f.readlines()
    f.seek(0)
    for line in new_f:
        if "PPL"not in line.split() or "PPLX"not in line.split() or "PPLC"not in line.split():
            f.write(line)
    f.truncate()

I was more thinking in this way but it fails when i add multiple criterias

output = []
with open('AW.txt', 'r+', encoding='utf-8') as f:
    lines = f.readlines()
    criteria = 'PPL'
    output =[line for line in lines if criteria not in line]

f.writelines(output)

Regards


Solution

  • You can use regular expressions to your rescue which will reduce the number of statements and checks in the code. If you have a list of criteria which can be dynamic, let's call the list of criteria crit_list, then the code would look like-

    import re 
    with open("AW.txt", "r+", encoding='utf-8') as f:
        new_f = f.readlines()
        crit_list = ['PPL', 'PPLC', 'PPLX']    # Can use any number of criterions 
        obj = re.compile(r'%s' % ('|'.join(crit_list)))
        out_lines = [line for line in new_f if not obj.search(line)]
        f.truncate(0)
        f.seek(0)
        f.writelines(out_lines)
    

    Use of regex makes it look different from how OP had posted. Let me explain the two lines containing the regex-

    obj = re.compile(r'%s' % ('|'.join(crit_list)))
    

    This line creates a regex object with the regular expression 'PPL|PPLX|PPLC' which means match at least one of these strings in the given line which can be thought of as a substitute for using as many ors in the code as there are criteria.

    out_lines = [line for line in new_f if not obj.search(line)]
    

    This statement means, search for the given criteria in the given line and if at least of them is found, preserve that line.

    Hope that clears your doubts.