Search code examples
pythonlistsumnested-listsbuilt-in

Python built-in sum for nested list


When I finished leetcode 1313, I find a special usage of built-in sum function.

The Leetcode Problem 1313

We are given a list nums of integers representing a list compressed with run-length encoding.

Consider each adjacent pair of elements [a, b] = [nums[2*i], nums[2*i+1]] (with i >= 0).  For each such pair, there are a elements with value b in the decompressed list.

Return the decompressed list.

 

Example 1:

Input: nums = [1,2,3,4]
Output: [2,4,4,4]
Explanation: The first pair [1,2] means we have freq = 1 and val = 2 so we generate the array [2].
The second pair [3,4] means we have freq = 3 and val = 4 so we generate [4,4,4].
At the end the concatenation [2] + [4,4,4,4] is [2,4,4,4].

There is a solution using sum

nums = [1,2,3,4]
g = ([b] * a for a, b in zip(nums[::2], nums[1::2]))
print(list(g))
g = ([b] * a for a, b in zip(nums[::2], nums[1::2]))
print(sum(g,[]))

Output:

[[2], [4, 4, 4]]
[2, 4, 4, 4]

My question

I can't tell why sum can deal with a nested list in this situation. Can any one tell me about it? Or some other functions behavior like this?

Here is the official guide for built-in sum.


Solution

  • Short-answer

    The given code-fragment runs successive list concatenations.

    How it works

    Roughly the built-in sum() function works like this:

    def sum(iterable, /, start=0):
        total = start
        for x in iterable:
            total = total + x
        return total
    

    The + operator calls __add__ on the left-hand operand so that 3 + 4 runs as (3).__add__(4), an addition operation on integers. Likewise, [10, 20] + [30, 40, 50] runs as [10, 20].__add__([30, 40, 50]), a concatenation operation on lists.

    Here's how it plays out in the given example:

    >>> nums = [1,2,3,4]
    >>> g = ([b] * a for a, b in zip(nums[::2], nums[1::2]))
    >>> result = []
    >>> x = next(g)
    >>> result = result + x
    >>> result
    [2]
    >>> x = next(g)
    >>> result = result + x
    >>> result
    [2, 4, 4, 4]
    

    Why it is not a good idea

    Successive list concatenations make next list after each addition, so they run at O(n**2) speed, meaning that this a quadratic algorithm that runs excessively slow when given large inputs.

    Better way

    Instead of building new lists at each step, just extend the base list in-place. This runs at O(n) linear speed:

    >>> nums = [1,2,3,4]
    >>> g = ([b] * a for a, b in zip(nums[::2], nums[1::2]))
    >>> result = []                 # new list
    >>> for x in g:
            result.extend(x)        # extend in-place
    
    >>> result
    [2, 4, 4, 4]
    

    Even better way

    The itertools module provides a tool for chaining together iterators. This makes short-work of the problem:

    >>> nums = [1,2,3,4]
    >>> g = ([b] * a for a, b in zip(nums[::2], nums[1::2]))
    >>> list(chain.from_iterable(g))
    [2, 4, 4, 4]
    

    This solution is short, fast and works well even with large inputs.