Search code examples
pythonpandasnumpypython-itertools

what is the quickest way to iterate through a numpy array


I noticed a meaningful difference between iterating through a numpy array "directly" versus iterating through via the tolist method. See timing below:

directly
[i for i in np.arange(10000000)]
via tolist
[i for i in np.arange(10000000).tolist()]

enter image description here


considering I've discovered one way to go faster. I wanted to ask what else might make it go faster?

what is fastest way to iterate through a numpy array?


Solution

  • These are my timings on a slower machine

    In [1034]: timeit [i for i in np.arange(10000000)]
    1 loop, best of 3: 2.16 s per loop
    

    If I generate the range directly (Py3 so this is a genertor) times are much better. Take this a baseline for a list comprehension of this size.

    In [1035]: timeit [i for i in range(10000000)]
    1 loop, best of 3: 1.26 s per loop
    

    tolist converts the arange to a list first; takes a bit longer, but the iteration is still on a list

    In [1036]: timeit [i for i in np.arange(10000000).tolist()]
    1 loop, best of 3: 1.6 s per loop
    

    Using list() - same time as direct iteration on the array; that suggests that the direct iteration first does this.

    In [1037]: timeit [i for i in list(np.arange(10000000))]
    1 loop, best of 3: 2.18 s per loop
    
    In [1038]: timeit np.arange(10000000).tolist()
    1 loop, best of 3: 927 ms per loop
    

    same times a iterating on the .tolist

    In [1039]: timeit list(np.arange(10000000))
    1 loop, best of 3: 1.55 s per loop
    

    In general if you must loop, working on a list is faster. Access to elements of a list is simpler.

    Look at the elements returned by indexing.

    a[0] is another numpy object; it is constructed from the values in a, but not simply a fetched value

    list(a)[0] is the same type; the list is just [a[0], a[1], a[2]]]

    In [1043]: a = np.arange(3)
    In [1044]: type(a[0])
    Out[1044]: numpy.int32
    In [1045]: ll=list(a)
    In [1046]: type(ll[0])
    Out[1046]: numpy.int32
    

    but tolist converts the array into a pure list, in this case, as list of ints. It does more work than list(), but does it in compiled code.

    In [1047]: ll=a.tolist()
    In [1048]: type(ll[0])
    Out[1048]: int
    

    In general don't use list(anarray). It rarely does anything useful, and is not as powerful as tolist().

    What's the fastest way to iterate through array - None. At least not in Python; in c code there are fast ways.

    a.tolist() is the fastest, vectorized way of creating a list integers from an array. It iterates, but does so in compiled code.

    But what is your real goal?