Search code examples
pythonarraysnumpyvectorization

How to efficiently concatenate many arange calls in numpy?


I'd like to vectorize calls like numpy.arange(0, cnt_i) over a vector of cnt values and concatenate the results like this snippet:

import numpy
cnts = [1,2,3]
numpy.concatenate([numpy.arange(cnt) for cnt in cnts])

array([0, 0, 1, 0, 1, 2])

Unfortunately the code above is very memory inefficient due to the temporary arrays and list comprehension looping.

Is there a way to do this more efficiently in numpy?


Solution

  • Here's a completely vectorized function:

    def multirange(counts):
        counts = np.asarray(counts)
        # Remove the following line if counts is always strictly positive.
        counts = counts[counts != 0]
    
        counts1 = counts[:-1]
        reset_index = np.cumsum(counts1)
    
        incr = np.ones(counts.sum(), dtype=int)
        incr[0] = 0
        incr[reset_index] = 1 - counts1
    
        # Reuse the incr array for the final result.
        incr.cumsum(out=incr)
        return incr
    

    Here's a variation of @Developer's answer that only calls arange once:

    def multirange_loop(counts):
        counts = np.asarray(counts)
        ranges = np.empty(counts.sum(), dtype=int)
        seq = np.arange(counts.max())
        starts = np.zeros(len(counts), dtype=int)
        starts[1:] = np.cumsum(counts[:-1])
        for start, count in zip(starts, counts):
            ranges[start:start + count] = seq[:count]
        return ranges
    

    And here's the original version, written as a function:

    def multirange_original(counts):
        ranges = np.concatenate([np.arange(count) for count in counts])
        return ranges
    

    Demo:

    In [296]: multirange_original([1,2,3])
    Out[296]: array([0, 0, 1, 0, 1, 2])
    
    In [297]: multirange_loop([1,2,3])
    Out[297]: array([0, 0, 1, 0, 1, 2])
    
    In [298]: multirange([1,2,3])
    Out[298]: array([0, 0, 1, 0, 1, 2])
    

    Compare timing using a larger array of counts:

    In [299]: counts = np.random.randint(1, 50, size=50)
    
    In [300]: %timeit multirange_original(counts)
    10000 loops, best of 3: 114 µs per loop
    
    In [301]: %timeit multirange_loop(counts)
    10000 loops, best of 3: 76.2 µs per loop
    
    In [302]: %timeit multirange(counts)
    10000 loops, best of 3: 26.4 µs per loop