Combine two NumPy arrays into one structured array for appending to a PyTables table

I have two unstructured NumPy arrays a and b with shapes (N,) and (N, 256, 2) respectively and dtype np.float. I wish to combine these into a single structured array with shape (N,) and dtype [('field1', np.float), ('field2', np.float, (256, 2))].

The documentation on this is surprisingly lacking. I've found methods like np.lib.recfunctions.merge_arrays but have not been able to find the precise combination of features required to do this.

For the sake of avoiding the XY problem, I'll state my wider aims.

I have a PyTables table with layout {"field1": tables.FloatCol(), "field2": tables.FloatCol(shape = (256, 2))}. The two NumPy arrays represent N new rows to be appended to each of these fields. N is large, so I wish to do this with a single efficient table.append(rows) call, rather than the slow process of looping through table.row['field'] = ....

The table.append documentation says

The rows argument may be any object which can be converted to a structured array compliant with the table structure (otherwise, a ValueError is raised). This includes NumPy structured arrays, lists of tuples or array records, and a string or Python buffer.

Converting my arrays to an appropriate structured array seems to be what I should be doing here. I'm looking for speed, and I anticipate the other options being slower.

Solution

Define the dtype, and create an empty/zeros array:

In [163]: dt = np.dtype([('field1', np.float), ('field2', np.float, (4, 2))])            
In [164]: arr = np.zeros(3, dt)     # float display is prettier                                                          
In [165]: arr                                                                            
Out[165]: 
array([(0., [[0., 0.], [0., 0.], [0., 0.], [0., 0.]]),
       (0., [[0., 0.], [0., 0.], [0., 0.], [0., 0.]]),
       (0., [[0., 0.], [0., 0.], [0., 0.], [0., 0.]])],
      dtype=[('field1', '<f8'), ('field2', '<f8', (4, 2))])

Assign values field by field:

In [166]: arr['field1'] = np.arange(3)                                                   
In [167]: arr['field2'].shape                                                            
Out[167]: (3, 4, 2)
In [168]: arr['field2'] = np.arange(24).reshape(3,4,2)                                   
In [169]: arr                                                                            
Out[169]: 
array([(0., [[ 0.,  1.], [ 2.,  3.], [ 4.,  5.], [ 6.,  7.]]),
       (1., [[ 8.,  9.], [10., 11.], [12., 13.], [14., 15.]]),
       (2., [[16., 17.], [18., 19.], [20., 21.], [22., 23.]])],
      dtype=[('field1', '<f8'), ('field2', '<f8', (4, 2))])

np.rec does have a function that works similarly:

In [174]: np.rec.fromarrays([np.arange(3.), np.arange(24).reshape(3,4,2)], dtype=dt)     
Out[174]: 
rec.array([(0., [[ 0.,  1.], [ 2.,  3.], [ 4.,  5.], [ 6.,  7.]]),
           (1., [[ 8.,  9.], [10., 11.], [12., 13.], [14., 15.]]),
           (2., [[16., 17.], [18., 19.], [20., 21.], [22., 23.]])],
          dtype=[('field1', '<f8'), ('field2', '<f8', (4, 2))])

This is the same, except fields can be accessed as attributes (as well). Under the covers it does the same by-field assignment.

numpy.lib.recfunctions is another collection of structured array functions. These too mostly follow the by-field assignment approach.