I love using np.fromiter
from numpy
because it is a resource-lazy way to build np.array
objects. However, it seems like it doesn't support multidimensional arrays, which are quite useful as well.
import numpy as np
def fun(i):
""" A function returning 4 values of the same type.
"""
return tuple(4*i + j for j in range(4))
# Trying to create a 2-dimensional array from it:
a = np.fromiter((fun(i) for i in range(5)), '4i', 5) # fails
# This function only seems to work for 1D array, trying then:
a = np.fromiter((fun(i) for i in range(5)),
[('', 'i'), ('', 'i'), ('', 'i'), ('', 'i')], 5) # painful
# .. `a` now looks like a 2D array but it is not:
a.transpose() # doesn't work as expected
a[0, 1] # too many indices (of course)
a[:, 1] # don't even think about it
How can I get a
to be a multidimensional array while keeping such a lazy construction based on generators?
Short update on the question: with NumPy=1.23
it is now possible to do exactly what is given in the example:
import numpy as np
def fun(i):
"""A function returning 4 values of the same type."""
return tuple(4*i + j for j in range(4))
# Trying to create a 2-dimensional array from it:
a = np.fromiter((fun(i) for i in range(5)), dtype='4i', count=5)
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15],
# [16, 17, 18, 19]], dtype=int32)
Personally, I find it more readable to pass the datatypes directly instead of using the strings (not that 'i'
results in int32
and not the standard int64
):
a = np.fromiter((fun(i) for i in range(5)), dtype=np.dtype((int, 4)), count=5)
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15],
# [16, 17, 18, 19]])
See also the documentation of fromiter which contains a similar example.