Search code examples
pythonnumpystructured-array

Broken structured to unstructured numpy array conversion in 1.16.0


I want to convert NumPy structured array with columns of the same (np.float) type to unstructured array in Numpy 1.16.0.

Previously I did it like this:

array = np.ones((100,), dtype=[('user', np.object), ('item', np.float), ('value', np.float)])
array[['item','value']].view((np.float, 2))

In 1.16.0 the structured_to_unstructured func appeared at numpy.lib.recfunctions.

But for a view from array with object columns both new structured_to_unstructured and old view-way throws TypeError: Cannot change data-type for object array.

It works OK for views from a structured array without object columns at all, but crashes if view with only numeric columns made from array containing object field.


Solution

  • With 1.16 there was a major change in the handling of multifield views. You need to use rf.repack_fields to get earlier behavior.

    In [277]: import numpy.lib.recfunctions as rf 
    
    In [287]: arr = np.ones(3, dtype='O,f,f')                                                                    
    In [288]: arr                                                                                                
    Out[288]: 
    array([(1, 1., 1.), (1, 1., 1.), (1, 1., 1.)],
          dtype=[('f0', 'O'), ('f1', '<f4'), ('f2', '<f4')])
    In [289]: rf.structured_to_unstructured(arr[['f1','f2']])                                                    
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-289-8700aa9aacb4> in <module>
    ----> 1 rf.structured_to_unstructured(arr[['f1','f2']])
    
    /usr/local/lib/python3.6/dist-packages/numpy/lib/recfunctions.py in structured_to_unstructured(arr, dtype, copy, casting)
        969     with suppress_warnings() as sup:  # until 1.16 (gh-12447)
        970         sup.filter(FutureWarning, "Numpy has detected")
    --> 971         arr = arr.view(flattened_fields)
        972 
        973     # next cast to a packed format with all fields converted to new dtype
    
    /usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
        492 
        493     if newtype.hasobject or oldtype.hasobject:
    --> 494         raise TypeError("Cannot change data-type for object array.")
        495     return
        496 
    
    TypeError: Cannot change data-type for object array.
    

    repacking before conversion:

    In [290]: rf.structured_to_unstructured(rf.repack_fields(arr[['f1','f2']]))                                  
    Out[290]: 
    array([[1., 1.],
           [1., 1.],
           [1., 1.]], dtype=float32)
    

    Multifield view preserves the underlying data layout. Notice the use of offsets in this display. The object field is still present, just not displayed.

    In [291]: arr[['f1','f2']]                                                                                   
    Out[291]: 
    array([(1., 1.), (1., 1.), (1., 1.)],
          dtype={'names':['f1','f2'], 'formats':['<f4','<f4'], 'offsets':[8,12], 'itemsize':16})
    

    repack makes a copy that does not include the object field:

    In [292]: rf.repack_fields(arr[['f1','f2']])                                                                 
    Out[292]: array([(1., 1.), (1., 1.), (1., 1.)], dtype=[('f1', '<f4'), ('f2', '<f4')])
    

    The view approach has problems even if all fields are float:

    In [301]: arr = np.ones(3, dtype='f,f,f')                                                                    
    In [302]: arr[['f1','f2']].view(('f',2))                                                                     
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-302-68433a44bcfe> in <module>
    ----> 1 arr[['f1','f2']].view(('f',2))
    
    ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged
    In [303]: arr[['f1','f2']]                                                                                   
    Out[303]: 
    array([(1., 1.), (1., 1.), (1., 1.)],
          dtype={'names':['f1','f2'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
    In [304]: rf.repack_fields(arr[['f1','f2']]).view(('f',2))                                                   
    Out[304]: 
    array([[1., 1.],
           [1., 1.],
           [1., 1.]], dtype=float32)