Search code examples
pythonpython-3.xnumpynumpy-ufuncrecarray

ufunc (min, max, mean, etc) on structured (record) arrays with different dtype


I am working in Python(3.8) with numpy(1.20.3) and trying to perform simple functions on a structured array having different data types.

def test_large_record():
    x = numpy.array([0.0, 0.2, 0.3], dtype=numpy.float)
    x_2 = numpy.array([0.01, 0.12, 0.82], dtype=numpy.float)
    y = numpy.array([1, 5, 7], dtype=numpy.int)
    rec_array = numpy.rec.fromarrays([x, x_2, y], dtype=[('x', '<f8'), ('x_2', '<f8'), ('y', '<i8')])

    print(rec_array.min())

This results in a "TypeError: cannot perform reduce with flexible type".

I tried to create something that would then go through a generic structured array and return a generated view of each field array having the same data type.... but that doesn't seem to work.

def rec_homogeneous_generator(rec_array):
    dtype = {}

    for name, dt in rec_array.dtype.descr:
        if dt not in dtype.keys():
            dtype[dt] = []

        dtype[dt].append(name)

    for dt, cols in dtype.items():
        r = rec_array[cols]
        v = r.view(dt)
        yield v


def test_large_record():
    x = numpy.array([0.0, 0.2, 0.3], dtype=numpy.float)
    x_2 = numpy.array([0.01, 0.12, 0.82], dtype=numpy.float)
    y = numpy.array([1, 5, 7], dtype=numpy.int)
    rec_array = numpy.rec.fromarrays([x, x_2, y], dtype=[('x', '<f8'), ('x_2', '<f8'), ('y', '<i8')])

    for h_array in rec_homogeneous_generator(rec_array):
        print(h_array.min(axis=0))

This results in 0.0 and 0 which is not what I expected. I should get [0, 0.01] and 1.

Anyone have any good ideas?


Solution

  • Operating on one field at a time:

    In [21]: [rec_array[field].min() for field in rec_array.dtype.fields]
    Out[21]: [0.0, 0.01, 1]
    

    With your multi-field indexing in a recent numpy version

    In [23]: list(rec_homogeneous_generator(rec_array))
    Out[23]: 
    [rec.array([0.0e+000, 1.0e-002, 4.9e-324, 2.0e-001, 1.2e-001, 2.5e-323,
                3.0e-001, 8.2e-001, 3.5e-323],
               dtype=float64),
     rec.array([                  0, 4576918229304087675,                   1,
                4596373779694328218, 4593311331947716280,                   5,
                4599075939470750515, 4605561122934164029,                   7],
               dtype=int64)]
    

    Multi-field indexing:

    In [25]: rec_array[['x','x_2']]
    Out[25]: 
    rec.array([(0. , 0.01), (0.2, 0.12), (0.3, 0.82)],
              dtype={'names':['x','x_2'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':24})
    

    Better handling of multi-field indexing:

    In [26]: import numpy.lib.recfunctions as rf
    In [28]: rf.repack_fields(rec_array[['x','x_2']])
    Out[28]: 
    rec.array([(0. , 0.01), (0.2, 0.12), (0.3, 0.82)],
              dtype=[('x', '<f8'), ('x_2', '<f8')])
    

    Now we can change to float:

    In [29]: rf.repack_fields(rec_array[['x','x_2']]).view(float)
    Out[29]: 
    rec.array([0.  , 0.01, 0.2 , 0.12, 0.3 , 0.82],
              dtype=float64)
    

    This view is 1d.

    or better yet:

    In [30]: rf.structured_to_unstructured(rec_array[['x','x_2']])
    Out[30]: 
    rec.array([[0.  , 0.01],
               [0.2 , 0.12],
               [0.3 , 0.82]],
              dtype=float64)
    

    These functions are documented on the structured array page.