Search code examples
numpyrecarray

recarray with lists: how to reference first element in list


I want to copy contents of a few fields in a record array into a ndarray (both type float64). I know how to do this when the recarray data has a single value in each field:

my_ndarray[:,0]=my_recarray['X']  #(for field 'X')

Now I have a recarray with a list of 5 floats in each field, and I only want to copy the first element of each list. When I use the above with the new recarray (and list), I get this error:

ValueError: could not broadcast input array from shape (92,5) into shape (92)

That makes total sense (in hindsight).

I thought I could get just the first element of each with this:

my_ndarray[:,0]=my_recarray['X'][0]  #(for field 'X')

I get this error:

ValueError: could not broadcast input array from shape (5) into shape (92)

I sorta understand...numpy is only taking the first row (5 elements) and trying to broadcast into a 92 element column.

So....now I'm wondering how to get the first element of each list down the 92 element column, Scratchin my head.... Thanks in advance for advice.


Solution

  • My guess is that the recarray has a dtype where one of the fields has shape 5:

    In [48]: dt = np.dtype([('X',int,5),('Y',float)])
    In [49]: arr = np.zeros(3, dtype=dt)
    In [50]: arr
    Out[50]: 
    array([([0, 0, 0, 0, 0], 0.), ([0, 0, 0, 0, 0], 0.),
           ([0, 0, 0, 0, 0], 0.)], dtype=[('X', '<i8', (5,)), ('Y', '<f8')])
    

    Accessing this field by name produces an array that is (3,5) shape (analogous to your (92,5):

    In [51]: arr['X']
    Out[51]: 
    array([[0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0]])
    

    This could be described as a list of 5 items for each record, but indexing with field name produces a 2d array, which can be indexing like any 2d numpy array.

    Let's set those values to something interesting:

    In [52]: arr['X'] = np.arange(15).reshape(3,5)
    In [53]: arr
    Out[53]: 
    array([([ 0,  1,  2,  3,  4], 0.), ([ 5,  6,  7,  8,  9], 0.),
           ([10, 11, 12, 13, 14], 0.)],
          dtype=[('X', '<i8', (5,)), ('Y', '<f8')])
    

    We can fetch the first column of this field with:

    In [54]: arr['X'][:,0]
    Out[54]: array([ 0,  5, 10])
    

    If you have several fields with a structure like this, you'll probably have to access each one by name. There's a limit to what you can do with multi-field indexing.