Background:
I am writing a little script which requires, as one of it's arguments, an email address list in a file. The script will them go on to use the email address over a telnet connection to an SMTP server, so they need to be syntactically valid; consequently I have put a function to check the email address validity (incidentally, this regex may not be perfect, but is not the focus of the question, please bear with me. Will probably be loosened up):
def checkmailsyntax(email):
match = re.match('^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$', email)
if match == None:
return True
The main() program goes on to read the input filename as an argument (in argparse) and insert it into a (currently global) list:
with open(args.targetfile) as targets:
target_email_list = targets.readlines()
I figured it would be great for the script to automatically delete an email address from the list (rather than just telling you it was wrong which is what it used to do) if the checkmailsyntax
function failed. This cleaned list could then go on to submit syntactically valid email addresses to the SMTP server:
for i in target_email_list:
if checkmailsyntax(i):
target_email_list.remove(i)
Error checking code that I have put in both before and after the delete element snippet to see if it's doing it's job:
for i in target_email_list:
print i
The issue: The output of the code is thus:
Before delete element snippet (and the entire contents of the file submitted):
[email protected]
[email protected]
[email protected]
noemail.com
incorrectemail.com
[email protected]
pretendemail.com
wrongemail.com
[email protected]
badlywrong.com
[email protected]
After delete element snippet:
[email protected]
[email protected]
[email protected]
incorrectemail.com
[email protected]
wrongemail.com
[email protected]
[email protected]
So I'm pretty stumped as to why 'noemail.com'
, 'pretendemail.com'
and 'badlywrong.com'
were removed and yet 'incorrectemail.com'
and 'wrongemail.com'
are not. It seems to occur when there are two syntactically incorrect emails in the file sequentially.
Can anyone point me in the right direction?
It is because you are removing elements from the list while iterating over it:
for i in target_email_list:
if checkmailsyntax(i):
target_email_list.remove(i) # here
Since, following values are together:
pretendemail.com
wrongemail.com
Once you remove pretendemail.com
email, the next one wrongemail.com
shifts up and the iterator thinks that this has been iterated. So the item which comes next is [email protected]
and wrongemail.com
is never checked for valid syntax. You can just add print(i)
before checking the syntax and see for yourself.
You can use list comprehension for this purpose:
valid_emails = [email for email in target_email_list if checkmailsyntax(email)]