Search code examples
pythonnumpyvectorization

Opposite of numpy.reduce


So in numpy we have a reduce function with which we can reduce one dimension of an array by applying a function to the elements across that dimension. Is there also an inverse of this function that would take a single element and expand it to a whole new dimension?

Let's say I have these two arrays:

class A:
    def __init__(self, a, b, c):
        self.value = a, b, c

a = np.array([A(1,2,3), A(4,5,6)])
b = np.array([1<<8 | 2<<4 | 3, 4<<8 | 5<<4 | 6])

And I would like to transform either of these two arrays into

np.array([[1,2,3],[4,5,6]])

What I'm currently doing is the following:

def expand(arr):
    a = np.empty((*arr.shape, 3))
    for x, y in np.ndindex(arr.shape):
        if isinstance(arr[x,y], A):
            a[x,y] = arr[x,y].value
        else:
            a[x,y] = arr[x,y] >> 8, arr[x,y] >> 4 & 0xF, arr[x,y] & 0xF
    return a

This works, but I'd like to avoid (slow?) iteration since it goes against the spirit of numpy.

I've also tried a solution using np.vectorize, but it doesn't work as I'd want it to:

def expanded(element):
    if isinstance(element, A):
        return element.value
    return element >> 8, element >> 4 & 0xF, element & 0xF
f = np.vectorize(expanded)
f(a)  # Prints a tuple of arrays instead of the desired single array

Is there a better way to expand a single value to a new dimension, either via some mathematical operation or via object attribute access?


Solution

  • Your array of A instances.

    In [56]: a
    Out[56]: 
    array([<__main__.A object at 0x000001BE0C814220>,
           <__main__.A object at 0x000001BE138B8610>], dtype=object)
    

    Define a __repr__ to get a prettier display.

    Iterate on the instances, returning the value:

    In [59]: [np.array(i.value) for i in a]
    Out[59]: [array([1, 2, 3]), array([4, 5, 6])]
    

    Which can be turned into one array with:

    In [60]: np.vstack(_)
    Out[60]: 
    array([[1, 2, 3],
           [4, 5, 6]])
    

    or

    In [61]: np.array(__)
    Out[61]: 
    array([[1, 2, 3],
           [4, 5, 6]])
    

    vectorize makes an array for each value returned for the instance, here 3 arrays:

    In [62]: np.vectorize(lambda i: i.value)(a)
    Out[62]: (array([1, 4]), array([2, 5]), array([3, 6]))
    

    which can be turned to a transpose of the previous array:

    In [63]: np.array(_)
    Out[63]: 
    array([[1, 4],
           [2, 5],
           [3, 6]])
    

    Returning an array (instead of tuple), and specifying otypes, gives another object dtype array:

    In [64]: np.vectorize(lambda i: np.array(i.value), otypes=[object])(a)
    Out[64]: array([array([1, 2, 3]), array([4, 5, 6])], dtype=object)
    In [65]: np.vstack(_)
    Out[65]: 
    array([[1, 2, 3],
           [4, 5, 6]])
    

    signature can produce the numeric array directly. I don't think this is any faster.

    In [66]: np.vectorize(lambda i: np.array(i.value), signature='()->(n)')(a)
    Out[66]: 
    array([[1, 2, 3],
           [4, 5, 6]])
    

    A slightly more primative version of vectorize returns an object dtype array directly:

    In [68]: np.frompyfunc(lambda i: np.array(i.value), 1,1)(a)
    Out[68]: array([array([1, 2, 3]), array([4, 5, 6])], dtype=object)
    

    Comparative timings on this small example wont tell us much. You need to explore a more realistic size a.

    The fast numpy code works with numeric dtypes. For object dtype speeds are all approximately equivalent to list comprehensions.