Search code examples
pythonarraysnumpyperformanceappend

appending array to an array: fastest way


I have two arrays of arrays:

array1 = np.array([np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])],dtype=object)
array2 = np.array([np.array([9, 8]), np.array([0]), np.array([12])],dtype=object)

I need to create a new array of arrays, by appending corresponding individual arrays. I should get:

array_final=np.array([np.array([1,2,3,9,8]),np.array([4,5,6,0]),np.array([7,8,9,12])],dtype=object)

Which is the fastest way? I need to do this operation million of times.


Solution

  • Your 2 arrays:

    In [2]: array1 = np.array([np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])],dtype=object)
       ...: array2 = np.array([np.array([9, 8]), np.array([0]), np.array([12])],dtype=object)
    

    But look at the first:

    In [3]: array1
    Out[3]: 
    array([[1, 2, 3],
           [4, 5, 6],
           [7, 8, 9]], dtype=object)    
    In [4]: array1.shape
    Out[4]: (3, 3)
    

    Because the subarrays are all the same length, the result is (3,3), not (3,); the object dtype didn't change that.

    The other is (3,), since the subarrays differ in shape:

    In [5]: array2
    Out[5]: array([array([9, 8]), array([0]), array([12])], dtype=object)    
    In [6]: array2.shape
    Out[6]: (3,)
    

    Lets make lists instead of arrays:

    In [7]: list1 = [np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]
       ...: list2 = [np.array([9, 8]), np.array([0]), np.array([12])]
    

    Now we can do a list comprehension, joining each pair of subarrays:

    In [8]: [np.hstack((a,b)) for a,b in zip(list1, list2)]
    Out[8]: [array([1, 2, 3, 9, 8]), array([4, 5, 6, 0]), array([ 7,  8,  9, 12])]
    

    And if necessary make an array from that:

    In [9]: np.array(_, object)
    Out[9]: 
    array([array([1, 2, 3, 9, 8]), array([4, 5, 6, 0]),
           array([ 7,  8,  9, 12])], dtype=object)
    

    If I try the same thing with the original arrays, the result is similar, but different. It iterates on the rows of the 2d array;

    In [10]: [np.hstack((a,b)) for a,b in zip(array1, array2)]
    Out[10]: 
    [array([1, 2, 3, 9, 8], dtype=object),
     array([4, 5, 6, 0], dtype=object),
     array([7, 8, 9, 12], dtype=object)]
    

    Or if I first convert the object dtype array1 to int:

    In [11]: [np.hstack((a,b)) for a,b in zip(array1.astype(int), array2)]
    Out[11]: [array([1, 2, 3, 9, 8]), array([4, 5, 6, 0]), array([ 7,  8,  9, 12])]
    

    object dtype arrays are little more than glorified (or debased?) lists - they contrain pointers to objects stored elsewhere in memory. So access is basically same as with a list comprehension, even a bit slower.