Search code examples
pythoniteratorpython-itertools

How to iterate over two sorted lists in largest pairs order in Python


I have two sorted iterables, e.g.:

a = [C, B, A]
b = [3, 2, 1]

I want to generate a list of all possible pairs, combining the largest (lowest index) pairs first. Largest means all combinations of element < 1 first (for both iterables), then all combinations of index < 2, etc. Desired result:

combine(a, b)
>> ((C, 3), (B, 3), (C, 2), (B, 2), (A, 3), (A, 2), (C, 1), (B, 1), (A, 1))

I looked at itertools.product(), but this generates the pairs in the wrong order.

Since the inputs are iterables (not necessarily lists), I'd prefer a library function that can handle iterables. The iterables may be long (millions of elements), so it would be desirable to avoid storing them all in memory. For the same reason, generating in the wrong order and then sorting is undesirable.

The output should be a generator, so that it's not necessary to iterate over all the combinations if not all are required.


Solution

  • Using the roundrobin recipe that Karl mentioned (copied verbatim from the recipes, could also import it from more-itertools). I think this will be faster, since all the hard work is done in C code of various itertools.

    from itertools import repeat, chain, cycle, islice
    
    def roundrobin(*iterables):
        "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
        # Recipe credited to George Sakkis
        num_active = len(iterables)
        nexts = cycle(iter(it).__next__ for it in iterables)
        while num_active:
            try:
                for next in nexts:
                    yield next()
            except StopIteration:
                # Remove the iterator we just exhausted from the cycle.
                num_active -= 1
                nexts = cycle(islice(nexts, num_active))
    
    def pairs(a, b):
        aseen = []
        bseen = []
        def agen():
            for aa in a:
                aseen.append(aa)
                yield zip(repeat(aa), bseen)
        def bgen():
            for bb in b:
                bseen.append(bb)
                yield zip(aseen, repeat(bb))
        return chain.from_iterable(roundrobin(agen(), bgen()))
    
    a = ['C', 'B', 'A']
    b = [3, 2, 1]
    print(*pairs(a, b))
    

    Output (Try it online!):

    ('C', 3) ('B', 3) ('C', 2) ('B', 2) ('A', 3) ('A', 2) ('C', 1) ('B', 1) ('A', 1)
    

    Benchmark with two iterables of 2000 elements each (Try it online!):

     50 ms   50 ms   50 ms  Kelly
    241 ms  241 ms  242 ms  Karl
    

    Alternatively, if the two iterables can be iterated multiple times, we don't need to save what we've seen (Try it online!):

    def pairs(a, b):
        def agen():
            for i, x in enumerate(a):
                yield zip(repeat(x), islice(b, i))
        def bgen():
            for i, x in enumerate(b, 1):
                yield zip(islice(a, i), repeat(x))
        return chain.from_iterable(roundrobin(agen(), bgen()))
    

    (Will add to the benchmark later... Should be a little slower than my first solution.)

    An extreme map/itertools version of that (Try it online!):

    def pairs(a, b):
        return chain.from_iterable(roundrobin(
            map(zip,
                map(repeat, a),
                map(islice, repeat(b), count())),
            map(zip,
                map(islice, repeat(a), count(1)),
                map(repeat, b))
        ))