Search code examples
pythonarraysnumpyzipgenfromtxt

Is zip() the most efficient way to combine arrays with respect to memory in numpy?


I use numpy and have two arrays, which are read with genfromtxt.

They have the shape <10000,> according to np.shape().

I want these two vectors to be in an array with the shape <10000,2>. For now I use:

x = zip(x1,x2)

but i am not sure if there is numpy function that does this better/more efficient. I dont think concatenate does what I think (or I'm doing it wrong).


Solution

  • There is numpy.column_stack:

    >>> a = numpy.arange(10)
    >>> b = numpy.arange(1, 11)
    >>> a
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> b
    array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
    >>> numpy.column_stack((a, b))
    array([[ 0,  1],
           [ 1,  2],
           [ 2,  3],
           [ 3,  4],
           [ 4,  5],
           [ 5,  6],
           [ 6,  7],
           [ 7,  8],
           [ 8,  9],
           [ 9, 10]])
    >>> numpy.column_stack((a, b)).shape
    (10, 2)
    

    I don't make any guarantees that this is in any way better than zip in terms of memory usage, etc, but underneath it all, it appears to rely on numpy.concatenate (which is implemented in C), so that's at least encouraging:

    >>> import inspect
    >>> print inspect.getsource(numpy.column_stack)
    def column_stack(tup):
        """
        Stack 1-D arrays as columns into a 2-D array.
    
        Take a sequence of 1-D arrays and stack them as columns
        to make a single 2-D array. 2-D arrays are stacked as-is,
        just like with `hstack`.  1-D arrays are turned into 2-D columns
        first.
    
        Parameters
        ----------
        tup : sequence of 1-D or 2-D arrays.
            Arrays to stack. All of them must have the same first dimension.
    
        Returns
        -------
        stacked : 2-D array
            The array formed by stacking the given arrays.
    
        See Also
        --------
        hstack, vstack, concatenate
    
        Notes
        -----
        This function is equivalent to ``np.vstack(tup).T``.
    
        Examples
        --------
        >>> a = np.array((1,2,3))
        >>> b = np.array((2,3,4))
        >>> np.column_stack((a,b))
        array([[1, 2],
               [2, 3],
               [3, 4]])
    
        """
        arrays = []
        for v in tup:
            arr = array(v, copy=False, subok=True)
            if arr.ndim < 2:
                arr = array(arr, copy=False, subok=True, ndmin=2).T
            arrays.append(arr)
        return _nx.concatenate(arrays, 1)