Search code examples
pythonpython-3.xiteratorpython-itertools

Iterator produced by itertools.groupby() is consumed unexpectedly


I have written a small program based on iterators to display a multicolumn calendar.

In that code I am using itertools.groupby to group the dates by month by the function group_by_months(). There I yield the month name and the grouped dates as a list for every month. However, when I let that function directly return the grouped dates as an iterator (instead of a list) the program leaves the days of all but the last column blank.

I can't figure out why that might be. Am I using groupby wrong? Can anyone help me to spot the place where the iterator is consumed or its output is ignored? Why is it especially the last column that "survives"?

Here's the code:

import datetime
from itertools import zip_longest, groupby

def grouper(iterable, n, fillvalue=None):
    """\
    copied from the docs:
    https://docs.python.org/3.4/library/itertools.html#itertools-recipes
    """
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

def generate_dates(start_date, end_date, step=datetime.timedelta(days=1)):
    while start_date < end_date:
        yield start_date
        start_date += step

def group_by_months(seq):
    for k,v in groupby(seq, key=lambda x:x.strftime("%B")):
        yield k, v # Why does it only work when list(v) is yielded here?

def group_by_weeks(seq):
    yield from groupby(seq, key=lambda x:x.strftime("%2U"))

def format_month(month, dates_of_month):
    def format_week(weeknum, dates_of_week):
        def format_day(d):
            return d.strftime("%3e")
        weekdays = {d.weekday(): format_day(d) for d in dates_of_week}
        return "{0} {7} {1} {2} {3} {4} {5} {6}".format(
            weeknum, *[weekdays.get(i, "   ") for i in range(7)])
    yield "{:^30}".format(month)
    weeks = group_by_weeks(dates_of_month)
    yield from map(lambda x:format_week(*x), weeks)

start, end = datetime.date(2016,1,1), datetime.date(2017,1,1)
dates = generate_dates(start, end)
months = group_by_months(dates)
formatted_months = map(lambda x: (format_month(*x)), months)
ncolumns = 3
quarters = grouper(formatted_months, ncolumns)
interleaved = map(lambda x: zip_longest(*x, fillvalue=" "*30), quarters)
formatted = map(lambda x: "\n".join(map("   ".join, x)), interleaved)
list(map(print, formatted))

This is the failing output:

           January                          February                          March             
                                                                  09           1   2   3   4   5
                                                                  10   6   7   8   9  10  11  12
                                                                  11  13  14  15  16  17  18  19
                                                                  12  20  21  22  23  24  25  26
                                                                  13  27  28  29  30  31        
            April                             May                              June             
                                                                  22               1   2   3   4
                                                                  23   5   6   7   8   9  10  11
                                                                  24  12  13  14  15  16  17  18
                                                                  25  19  20  21  22  23  24  25
                                                                  26  26  27  28  29  30        
             July                            August                         September           
                                                                  35                   1   2   3
                                                                  36   4   5   6   7   8   9  10
                                                                  37  11  12  13  14  15  16  17
                                                                  38  18  19  20  21  22  23  24
                                                                  39  25  26  27  28  29  30    
           October                          November                         December           
                                                                  48                   1   2   3
                                                                  49   4   5   6   7   8   9  10
                                                                  50  11  12  13  14  15  16  17
                                                                  51  18  19  20  21  22  23  24
                                                                  52  25  26  27  28  29  30  31

This is the expected output:

           January                          February                          March             
00                       1   2   05       1   2   3   4   5   6   09           1   2   3   4   5
01   3   4   5   6   7   8   9   06   7   8   9  10  11  12  13   10   6   7   8   9  10  11  12
02  10  11  12  13  14  15  16   07  14  15  16  17  18  19  20   11  13  14  15  16  17  18  19
03  17  18  19  20  21  22  23   08  21  22  23  24  25  26  27   12  20  21  22  23  24  25  26
04  24  25  26  27  28  29  30   09  28  29                       13  27  28  29  30  31        
05  31                                                                                          
            April                             May                              June             
13                       1   2   18   1   2   3   4   5   6   7   22               1   2   3   4
14   3   4   5   6   7   8   9   19   8   9  10  11  12  13  14   23   5   6   7   8   9  10  11
15  10  11  12  13  14  15  16   20  15  16  17  18  19  20  21   24  12  13  14  15  16  17  18
16  17  18  19  20  21  22  23   21  22  23  24  25  26  27  28   25  19  20  21  22  23  24  25
17  24  25  26  27  28  29  30   22  29  30  31                   26  26  27  28  29  30        
             July                            August                         September           
26                       1   2   31       1   2   3   4   5   6   35                   1   2   3
27   3   4   5   6   7   8   9   32   7   8   9  10  11  12  13   36   4   5   6   7   8   9  10
28  10  11  12  13  14  15  16   33  14  15  16  17  18  19  20   37  11  12  13  14  15  16  17
29  17  18  19  20  21  22  23   34  21  22  23  24  25  26  27   38  18  19  20  21  22  23  24
30  24  25  26  27  28  29  30   35  28  29  30  31               39  25  26  27  28  29  30    
31  31                                                                                          
           October                          November                         December           
39                           1   44           1   2   3   4   5   48                   1   2   3
40   2   3   4   5   6   7   8   45   6   7   8   9  10  11  12   49   4   5   6   7   8   9  10
41   9  10  11  12  13  14  15   46  13  14  15  16  17  18  19   50  11  12  13  14  15  16  17
42  16  17  18  19  20  21  22   47  20  21  22  23  24  25  26   51  18  19  20  21  22  23  24
43  23  24  25  26  27  28  29   48  27  28  29  30               52  25  26  27  28  29  30  31

Solution

  • As the docs state (c.f.):

    when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list

    That means the iterators are consumed, when the code later accesses the returned iterators out of order, i.e., when the groupby is actually iterated. The iteration happens out of order because of the chunking and interleaving that is done here.

    We observe this specific pattern (i.e., only the last column is fully displayed) because of the way we iterate. That is:

    1. The month names for the first line are printed. Thereby the iterators for up to the last column's month are consumed (and their content discarded). The groupby() object produces the last column's month name only after the first columns' data.

    2. We print the first week line. Thereby the already exhausted iterators for the first columns are filled up automatically using the default value passed to zip_longest(). Only the last column still provides actual data.

    3. The same happens for the next lines of month names.