So in numpy
we have a reduce
function with which we can reduce one dimension of an array by applying a function to the elements across that dimension. Is there also an inverse of this function that would take a single element and expand it to a whole new dimension?
Let's say I have these two arrays:
class A:
def __init__(self, a, b, c):
self.value = a, b, c
a = np.array([A(1,2,3), A(4,5,6)])
b = np.array([1<<8 | 2<<4 | 3, 4<<8 | 5<<4 | 6])
And I would like to transform either of these two arrays into
np.array([[1,2,3],[4,5,6]])
What I'm currently doing is the following:
def expand(arr):
a = np.empty((*arr.shape, 3))
for x, y in np.ndindex(arr.shape):
if isinstance(arr[x,y], A):
a[x,y] = arr[x,y].value
else:
a[x,y] = arr[x,y] >> 8, arr[x,y] >> 4 & 0xF, arr[x,y] & 0xF
return a
This works, but I'd like to avoid (slow?) iteration since it goes against the spirit of numpy
.
I've also tried a solution using np.vectorize
, but it doesn't work as I'd want it to:
def expanded(element):
if isinstance(element, A):
return element.value
return element >> 8, element >> 4 & 0xF, element & 0xF
f = np.vectorize(expanded)
f(a) # Prints a tuple of arrays instead of the desired single array
Is there a better way to expand a single value to a new dimension, either via some mathematical operation or via object attribute access?
Your array of A
instances.
In [56]: a
Out[56]:
array([<__main__.A object at 0x000001BE0C814220>,
<__main__.A object at 0x000001BE138B8610>], dtype=object)
Define a __repr__
to get a prettier display.
Iterate on the instances, returning the value
:
In [59]: [np.array(i.value) for i in a]
Out[59]: [array([1, 2, 3]), array([4, 5, 6])]
Which can be turned into one array with:
In [60]: np.vstack(_)
Out[60]:
array([[1, 2, 3],
[4, 5, 6]])
or
In [61]: np.array(__)
Out[61]:
array([[1, 2, 3],
[4, 5, 6]])
vectorize
makes an array for each value returned for the instance, here 3 arrays:
In [62]: np.vectorize(lambda i: i.value)(a)
Out[62]: (array([1, 4]), array([2, 5]), array([3, 6]))
which can be turned to a transpose of the previous array:
In [63]: np.array(_)
Out[63]:
array([[1, 4],
[2, 5],
[3, 6]])
Returning an array (instead of tuple), and specifying otypes, gives another object dtype array:
In [64]: np.vectorize(lambda i: np.array(i.value), otypes=[object])(a)
Out[64]: array([array([1, 2, 3]), array([4, 5, 6])], dtype=object)
In [65]: np.vstack(_)
Out[65]:
array([[1, 2, 3],
[4, 5, 6]])
signature
can produce the numeric array directly. I don't think this is any faster.
In [66]: np.vectorize(lambda i: np.array(i.value), signature='()->(n)')(a)
Out[66]:
array([[1, 2, 3],
[4, 5, 6]])
A slightly more primative version of vectorize
returns an object dtype array directly:
In [68]: np.frompyfunc(lambda i: np.array(i.value), 1,1)(a)
Out[68]: array([array([1, 2, 3]), array([4, 5, 6])], dtype=object)
Comparative timings on this small example wont tell us much. You need to explore a more realistic size a
.
The fast numpy
code works with numeric dtypes. For object dtype speeds are all approximately equivalent to list comprehensions.