Search code examples
pythondictionarypython-itertoolsxrange

How to efficiently select entries by date in python?


I have emails and dates. I can use 2 nested for loops to choose emails sent on same date, but how can i do it 'smart way' - efficiently?

# list of tuples - (email,date)

for entry in list_emails_dates:
    current_date = entry[1]
    for next_entry in list_emails_dates:
        if current_date = next_entry[1]
        list_one_date_emails.append(next_entry)

I know it can be written in shorter code, but I don't know itertools, or maybe use map, xrange?


Solution

  • You can just convert this to a dictionary, by collecting all emails related to a date into the same key.

    To do this, you need to use defaultdict from collections. It is an easy way to give a new key in a dictionary a default value.

    Here we are passing in the function list, so that each new key in the dictionary will get a list as the default value.

    emails = defaultdict(list)
    for email,email_date in list_of_tuples:
        emails[email].append(email_date)
    

    Now, you have emails['2013-14-07'] which will be a list of emails for that date.

    If we don't use a defaultdict, and do a dictionary comprehension like this:

    emails = {x[1]:x[0] for x in list_of_tuples}
    

    You'll have one entry for each date, which will be the last email for that that, since assigning to the same key will override its value. A dictionary is the most efficient way to lookup a value by a key. A list is good if you want to lookup a value by its position in a series of values (assuming you know its position).

    If for some reason you are not able to refactor it, you can use this template method, which will create a generator:

    def find_by_date(haystack, needle):
        for email, email_date in haystack:
            if email_date == needle:
                yield email
    

    Here is how you would use it:

    >>> email_list = [('[email protected]','2014-07-01'), ('[email protected]', '2014-07-01'), ('[email protected]', '2014-07-03')] 
    >>> all_emails = list(find_by_date(email_list, '2014-07-01'))
    >>> all_emails
    ['[email protected]', '[email protected]']
    

    Or, you can do this:

    >>> july_first = find_by_date(email_list, '2014-07-01')
    >>> next(july_first)
    '[email protected]'
    >>> next(july_first)
    '[email protected]'