python dictionary python-itertools xrange

How to efficiently select entries by date in python?

I have emails and dates. I can use 2 nested for loops to choose emails sent on same date, but how can i do it 'smart way' - efficiently?

# list of tuples - (email,date)

for entry in list_emails_dates:
    current_date = entry[1]
    for next_entry in list_emails_dates:
        if current_date = next_entry[1]
        list_one_date_emails.append(next_entry)

I know it can be written in shorter code, but I don't know itertools, or maybe use map, xrange?

Solution

You can just convert this to a dictionary, by collecting all emails related to a date into the same key.

To do this, you need to use defaultdict from collections. It is an easy way to give a new key in a dictionary a default value.

Here we are passing in the function list, so that each new key in the dictionary will get a list as the default value.

emails = defaultdict(list)
for email,email_date in list_of_tuples:
    emails[email].append(email_date)

Now, you have emails['2013-14-07'] which will be a list of emails for that date.

If we don't use a defaultdict, and do a dictionary comprehension like this:

emails = {x[1]:x[0] for x in list_of_tuples}

You'll have one entry for each date, which will be the last email for that that, since assigning to the same key will override its value. A dictionary is the most efficient way to lookup a value by a key. A list is good if you want to lookup a value by its position in a series of values (assuming you know its position).

If for some reason you are not able to refactor it, you can use this template method, which will create a generator:

def find_by_date(haystack, needle):
    for email, email_date in haystack:
        if email_date == needle:
            yield email

Here is how you would use it:

>>> email_list = [('[email protected]','2014-07-01'), ('[email protected]', '2014-07-01'), ('[email protected]', '2014-07-03')] 
>>> all_emails = list(find_by_date(email_list, '2014-07-01'))
>>> all_emails
['[email protected]', '[email protected]']

Or, you can do this:

>>> july_first = find_by_date(email_list, '2014-07-01')
>>> next(july_first)
'[email protected]'
>>> next(july_first)
'[email protected]'