Search code examples
pythonlistpython-itertools

Getting range of first element depending on following elements in list


I am struggling with the following. Basically I have a list:

dolist = [(1280, ['A1'], ['A2']), (1278, ['A1'], ['A2']), (1276, ['A1'], ['A2']), (1274, ['B1'], ['B2']), (1272, ['A1'], ['A2']), (1270, [], ['A2'])]

Now I want to have lists sorted sorted by element 2 and 3.

uniqdo = [ (['A1'],['A2']), (['B1'],['B2']),([],['A2']) ]
dorange = [ "1280-1276,1272","1274","1270" ]

I have tried to do with straightforward comparisons but the code becomes very long with several tests and looks a bit messy. There must be library functions which can do this reasonable quick.


Solution

  • It looks like itertools.groupby could help you:

    >>> dolist = [ (1280,['A1'],['A2']),(1278,['A1'],['A2']),(1276,['A1'],['A2']),(1274,['B1'],['B2']),(1272,['A1'],['A2']) ]
    >>> from itertools import groupby
    >>> [[v, [i for i,*_ in g]] for v, g in groupby(dolist, key= lambda l: (l[1][0], l[2][0]))]
    [[('A1', 'A2'), [1280, 1278, 1276]], [('B1', 'B2'), [1274]], [('A1', 'A2'), [1272]]]
    

    It shouldn't be hard to convert the above data structure to the one you want.

    Here's a start. You cannot leave any list as input because a Python list cannot be used as a dict key. So get_value returns None instead of an empty list:

    from itertools import groupby
    
    dolist = [(1280, ['A1'], ['A2']), (1278, ['A1'], ['A2']), (1276, ['A1'], ['A2']), (1274, ['B1'], ['B2']), (1272, ['A1'], ['A2']), (1270, [], ['A2'])]
    ranges = {}
    
    
    def get_value(l):
        if l:
            return l[0]
        else:
            return None
    
    
    def get_values(t):
        return (get_value(t[1]), get_value(t[2]))
    
    for v, g in groupby(dolist, get_values):
        ids = [str(t[0]) for t in g]
        if len(ids) > 1:
            range_str = ids[0] + '-' + ids[-1]
        else:
            range_str = ids[0]
        ranges.setdefault(v, []).append(range_str)
    
    print(ranges)
    # {('A1', 'A2'): ['1280-1276', '1272'], ('B1', 'B2'): ['1274'], (None, 'A2'): ['1270']}