I have two unstructured NumPy arrays a
and b
with shapes (N,)
and (N, 256, 2)
respectively and dtype np.float
. I wish to combine these into a single structured array with shape (N,)
and dtype [('field1', np.float), ('field2', np.float, (256, 2))]
.
The documentation on this is surprisingly lacking. I've found methods like np.lib.recfunctions.merge_arrays
but have not been able to find the precise combination of features required to do this.
For the sake of avoiding the XY problem, I'll state my wider aims.
I have a PyTables table with layout {"field1": tables.FloatCol(), "field2": tables.FloatCol(shape = (256, 2))}
. The two NumPy arrays represent N new rows to be appended to each of these fields. N is large, so I wish to do this with a single efficient table.append(rows)
call, rather than the slow process of looping through table.row['field'] = ...
.
The table.append
documentation says
The rows argument may be any object which can be converted to a structured array compliant with the table structure (otherwise, a ValueError is raised). This includes NumPy structured arrays, lists of tuples or array records, and a string or Python buffer.
Converting my arrays to an appropriate structured array seems to be what I should be doing here. I'm looking for speed, and I anticipate the other options being slower.
Define the dtype, and create an empty/zeros array:
In [163]: dt = np.dtype([('field1', np.float), ('field2', np.float, (4, 2))])
In [164]: arr = np.zeros(3, dt) # float display is prettier
In [165]: arr
Out[165]:
array([(0., [[0., 0.], [0., 0.], [0., 0.], [0., 0.]]),
(0., [[0., 0.], [0., 0.], [0., 0.], [0., 0.]]),
(0., [[0., 0.], [0., 0.], [0., 0.], [0., 0.]])],
dtype=[('field1', '<f8'), ('field2', '<f8', (4, 2))])
Assign values field by field:
In [166]: arr['field1'] = np.arange(3)
In [167]: arr['field2'].shape
Out[167]: (3, 4, 2)
In [168]: arr['field2'] = np.arange(24).reshape(3,4,2)
In [169]: arr
Out[169]:
array([(0., [[ 0., 1.], [ 2., 3.], [ 4., 5.], [ 6., 7.]]),
(1., [[ 8., 9.], [10., 11.], [12., 13.], [14., 15.]]),
(2., [[16., 17.], [18., 19.], [20., 21.], [22., 23.]])],
dtype=[('field1', '<f8'), ('field2', '<f8', (4, 2))])
np.rec
does have a function that works similarly:
In [174]: np.rec.fromarrays([np.arange(3.), np.arange(24).reshape(3,4,2)], dtype=dt)
Out[174]:
rec.array([(0., [[ 0., 1.], [ 2., 3.], [ 4., 5.], [ 6., 7.]]),
(1., [[ 8., 9.], [10., 11.], [12., 13.], [14., 15.]]),
(2., [[16., 17.], [18., 19.], [20., 21.], [22., 23.]])],
dtype=[('field1', '<f8'), ('field2', '<f8', (4, 2))])
This is the same, except fields can be accessed as attributes (as well). Under the covers it does the same by-field assignment.
numpy.lib.recfunctions
is another collection of structured array functions. These too mostly follow the by-field assignment approach.