Search code examples
pythonvalidationlogiciterationdynamic-arrays

Removing items from a list in python following validity check


Background:

I am writing a little script which requires, as one of it's arguments, an email address list in a file. The script will them go on to use the email address over a telnet connection to an SMTP server, so they need to be syntactically valid; consequently I have put a function to check the email address validity (incidentally, this regex may not be perfect, but is not the focus of the question, please bear with me. Will probably be loosened up):

def checkmailsyntax(email):
    match = re.match('^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$', email)

    if match == None:
        return True

The main() program goes on to read the input filename as an argument (in argparse) and insert it into a (currently global) list:

with open(args.targetfile) as targets:
    target_email_list = targets.readlines()

I figured it would be great for the script to automatically delete an email address from the list (rather than just telling you it was wrong which is what it used to do) if the checkmailsyntax function failed. This cleaned list could then go on to submit syntactically valid email addresses to the SMTP server:

for i in target_email_list:
    if checkmailsyntax(i):
        target_email_list.remove(i)

Error checking code that I have put in both before and after the delete element snippet to see if it's doing it's job:

for i in target_email_list:
    print i

The issue: The output of the code is thus:

Before delete element snippet (and the entire contents of the file submitted):

[email protected]  
[email protected]  
[email protected]  
noemail.com  
incorrectemail.com  
[email protected]  
pretendemail.com  
wrongemail.com  
[email protected]  
badlywrong.com  
[email protected]  

After delete element snippet:

[email protected]  
[email protected]  
[email protected]  
incorrectemail.com  
[email protected]  
wrongemail.com  
[email protected]  
[email protected]  

So I'm pretty stumped as to why 'noemail.com', 'pretendemail.com' and 'badlywrong.com' were removed and yet 'incorrectemail.com' and 'wrongemail.com' are not. It seems to occur when there are two syntactically incorrect emails in the file sequentially.

Can anyone point me in the right direction?


Solution

  • It is because you are removing elements from the list while iterating over it:

    for i in target_email_list:
        if checkmailsyntax(i):
            target_email_list.remove(i) # here
    

    Since, following values are together:

    pretendemail.com  
    wrongemail.com
    

    Once you remove pretendemail.com email, the next one wrongemail.com shifts up and the iterator thinks that this has been iterated. So the item which comes next is [email protected] and wrongemail.com is never checked for valid syntax. You can just add print(i) before checking the syntax and see for yourself.

    You can use list comprehension for this purpose:

    valid_emails = [email for email in target_email_list if checkmailsyntax(email)]