Search code examples
pythondictionarypython-itertools

Group dictionary by key and find max value


I have a dictionary with datetime as key and a list of IDs as value. It is actually the number of active users for each time of the day.

The dictionary looks like:

2016-03-09 12:13:24 [34941L, 34943L, 35183L, 35028L, 35031L, 35081L, 35091L, 35167L, 35180L]
2016-03-09 12:16:49 [34941L, 34943L, 35183L, 35028L, 35031L, 35081L, 35091L, 35167L, 35187L]
2016-03-09 12:17:14 [34941L, 34943L, 35183L, 35028L, 35031L, 35081L, 35091L, 35167L, 35187L]
2016-03-09 12:21:39 [34941L, 34943L, 35183L, 35028L, 35031L, 35081L, 35091L, 35167L]
2016-03-09 12:22:01 [34941L, 34943L, 35183L, 35028L, 35031L, 35081L, 35091L, 35188L]
2016-03-09 12:23:08 [34941L, 34943L, 35183L, 35028L, 35031L, 35081L, 35091L, 35188L]
2016-03-09 12:23:37 [35191L, 34941L, 34943L, 35183L, 35028L, 35031L, 35081L, 35091L]
2016-03-09 12:24:05 [35191L, 34941L, 34943L, 35183L, 35028L, 35031L, 35081L, 35091L]

What I want to do is to make a dictionary which will contain the maximum number of users for each day. Something like:

2016-03-07: 25
2016-03-08: 38
2016-03-09: 12
2016-03-10: 29

EDIT: I want to find the peak of each day.

So I need to find the length of the list of values then group by date of the key and finally find the maximum value of the group.

Finding the length of the list is the easy part with something like:

for time, user_id in sorted(users_by_time.iteritems()):
    user_by_time[time] = len(user_id)

But I am struggling with the grouping.

How can both grouping and max calculation be done and ideally in the most effective/pythonic way?


Solution

  • To get the peaks of each day is quite easy:

    from collections import defaultdict
    
    max_count_by_day = defaultdict(int)
    for dt, user_ids in users_by_time.iteritems():
        d = dt.date()
        max_count_by_day[d] = max(max_count_by_day[d], len(user_ids))
    

    For number of distinct users per day, use a defaultdict(set):

    users_in_day = defaultdict(set)
    for dt, user_ids in users_by_time.iteritems():
        users_in_day[dt.date()].update(user_ids)
    

    Then flatten the dictionary into another of date: count:

    usercount_per_day = {d: len(user_ids) for d, user_ids in users_in_day.iteritems()}