Search code examples
pythonpython-itertools

Split list on None and record index


I have a list which can contain both Nones and datetime objects. I need to split this in sublists of consecutive datetime objects and need to record the index of the first datetime object of this sublist in the original list.

E.g., I need to be able to turn

original = [None, datetime(2013, 6, 4), datetime(2014, 5, 12), None, None, datetime(2012, 5, 18), None]

into:

(1, [datetime.datetime(2013, 6, 4, 0, 0), datetime.datetime(2014, 5, 12, 0, 0)])
(5, [datetime.datetime(2012, 5, 18, 0, 0)])

I have tried two approaches. One using find:

binary = ''.join('1' if d else '0' for d in original)
end = 0
start = binary.find('1', end)
while start > -1:
    end = binary.find('0', start)
    if end < 0:
        end = len(binary)
    dates = original[start:end]
    print (start, dates)
    start = binary.find('1', end)

and one using groupby:

from itertools import groupby
for key, group in groupby(enumerate(original), lambda x: x[1] is not None):
    if key:
        group = list(group)
        start = group[0][0]
        dates = [t[1] for t in group]
        print (start, dates)

But both don't seem overly Pythonic to me. Is there a better way?


Solution

  • I'd use a generator to produce the elements, encapsulating the grouping:

    from itertools import takewhile
    
    def indexed_date_groups(it):
        indexed = enumerate(it)
        for i, elem in indexed:
            if elem is not None:
               yield (
                 i, [elem] + [v for i, v in takewhile(
                     lambda v: v[1] is not None, indexed)])
    

    Here I used itertools.takewhile() to produce the sublist once we find an initial not-None object.

    You can do the same with itertools.groupby() still, of course:

    from itertools import groupby
    
    def indexed_date_groups(it):
        for key, group in groupby(enumerate(it), lambda v: v[1] is not None):
            if key:
               indices, elems = zip(*group)
               yield indices[0], elems
    

    Demo:

    >>> list(indexed_date_groups(original))
    [(1, [datetime.datetime(2013, 6, 4, 0, 0), datetime.datetime(2014, 5, 12, 0, 0)]), (5, [datetime.datetime(2012, 5, 18, 0, 0)])]
    >>> original = [None, datetime(2013, 6, 4), datetime(2014, 5, 12), None, None, datetime(2012, 5, 18), None]
    >>> for index, group in indexed_date_groups(original):
    ...     print index, group
    ... 
    1 [datetime.datetime(2013, 6, 4, 0, 0), datetime.datetime(2014, 5, 12, 0, 0)]
    5 [datetime.datetime(2012, 5, 18, 0, 0)]