Search code examples
pythonnumpyuser-defined-types

numpy dtype error - (structured array creation)


I am having some trouble understanding why the following does not work:

np.dtype(dict(names="10", formats=np.float64))

I have been struggling with this because I would like to get the recfunctions function in numpy to work, but due to issues with the numpy.dtype, I haven't been successful. This is the error I am receiving at the moment:

dtype = np.dtype(dict(names=names, formats=formats))
ValueError: all items in the dictionary must have the same length.

I want to get a data structure that will contain a type of record array with multiple columns of data within each assigned field - similar to a dictionary where each value is a 2d array or several columns of data. Typically the data may end up being ~6 columns, ~2000 rows for each key or record, with ~200 records.

Here is what I have tried in a complete script: (although still giving the same error)

import numpy as np
from numpy.lib import recfunctions


# Just function to make random data
def make_data(i, j):
    # some arbitrary function to show that the number of columns may change, but rows stay the same length
    if i%3==0:
        data = np.array([[i for i in range(0,1150)]*t for t in range(0,3)])
    else:
        data = np.array([[i for i in range(0,1150)]*t for t in range(0,6)])
    return data

def data_struct(low_ij, high_ij):

    """
    Data Structure to contain several columns of data for different combined values between "low ij" and "high ij"

    Key: "(i, j)"
    Value: numpy ndarray (multidimensional)
    """

    for i in range(0,low_ij+1):
        for j in range(0,high_ij+1):
            # Get rid of some of the combinations
            # (unimportant)
            if(i<low_ij and j<low_ij):
                break
            elif(i<j):
                break

            # Combinations of interest to create structure
            else:
                names = str(i)+str(j)
                formats = np.float64
                data = np.array(make_data(i, j))
                try:
                    data_struct = recfunctions.append_fields(base=data_struct, names=names, data=data, dtypes=formats)
                # First loop will assign data_struct using this exception,
                # then proceed to use the try statement to add on the rest of the data
                except UnboundLocalError:
                    dtype = np.dtype(dict(names=names, formats=formats))
                    data_struct = np.array(data, dtype=dtype)

    return data_struct

Solution

  • Looks like you are trying to construct a structured array something like:

    In [152]: names=['1','2','3','4']
    In [153]: formats=[(float,2),(float,3),(float,2),(float,3)]
    In [154]: dt=np.dtype({'names':names, 'formats':formats})
    In [156]: ds=np.zeros(5, dtype=dt)
    
    In [157]: ds
    Out[157]: 
    array([([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0]),
           ([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0]),
           ([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0]),
           ([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0]),
           ([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0])], 
          dtype=[('1', '<f8', (2,)), ('2', '<f8', (3,)), ('3', '<f8', (2,)), 
               ('4', '<f8', (3,))])
    In [159]: ds['1']=np.arange(10).reshape(5,2)
    In [160]: ds['2']=np.arange(15).reshape(5,3)
    

    In other words, multiple fields, each with a different number of 'columns' (shape).

    Here I create initialize the whole array, and then fill the fields individually. That seems to be the most straight forward way of creating complex structured arrays.

    You are trying to build such an array incrementally, starting with one field, and adding new ones with recfunctions.append_fields

    In [162]: i=1; 
       ds2 = np.array(np.arange(5),dtype=np.dtype({'names':[str(i)],'formats':[float]}))
    In [164]: i+=1;
       ds2=recfunctions.append_fields(base=ds2,names=str(i),dtypes=float,
          data=np.arange(5), usemask=False,asrecarray=False)
    In [165]: i+=1;
       ds2=recfunctions.append_fields(base=ds2,names=str(i),dtypes=float,
          data=np.arange(5), usemask=False,asrecarray=False)
    
    In [166]: ds2
    Out[166]: 
    array(data = [(0.0, 0.0, 0.0) (1.0, 1.0, 1.0) (2.0, 2.0, 2.0) 
        (3.0, 3.0, 3.0) (4.0, 4.0, 4.0)], 
        dtype = [('1', '<f8'), ('2', '<f8'), ('3', '<f8')])
    

    This works when the appended fields all have 1 'column'. With the masking they can even have different numbers of 'rows'. But when I try to vary the internal shape it has problems appending the field. marge_arrays isn't any more successful.

    Even if we can get the incremental recfunctions approach to work, it probably will be slower than the initialize-and-fill approach. Even if you don't know the shape of each of the fields at the start, you could collect them all in a dictionary, and assemble the array from that. This kind of structured array isn't any more compact or efficient than a dictionary. It just makes certain styles of data access (across fields) more convenient.