Search code examples
pythonarraysnumpystructured-array

Filter numpy structured array based on multiple values


I have a numpy structured array. :

myArray = np.array([(1, 1, 1, u'Zone3', 9.223),
        (2, 1, 0, u'Zone2', 17.589),
        (3, 1, 1, u'Zone2', 26.95),
        (4, 0, 1, u'Zone1', 19.367),
        (5, 1, 1, u'Zone1', 4.395)],
         dtype=[('ID', '<i4'), ('Flag1', '<i4'), ('Flag2', '<i4'), ('ZoneName', '<U5'),
                ('Value', '<f8')])

I would like to sum the values from the "Value" column when multiple criteria are met. If I want Flag1 and Flag2 to ==1 i can use:

sumResult = (sum(myArray[((myArray["Flag1"] == 1) & (myArray["Flag2"] == 1))]["Value"]))

However, I would also like to include a third criteria based on whether or not values are in a list, something equivalent of using x in list:

criteriaList = ("Zone1", "Zone2")
sumResult = (sum(myArray[((myArray["Flag1"] == 1) & (myArray["Flag2"] == 1) &
                (myArray["ZoneName"] in criteriaList))]["Value"]))

Which should equal 31.345. I am new to numpy and have explored masked arrays, but am not clear if how or if these can be used with structured arrays. Thanks.


Solution

  • You need to use np.in1d to test for membership of your criteriaList:

    In [1]: myArray["ZoneName"] in criteriaList
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-1-ff2173ff4348> in <module>()
    ----> 1 myArray["ZoneName"] in criteriaList
    
    ValueError: The truth value of an array with more than one element is ambiguous.
    Use a.any() or a.all()
    
    In [2]: np.in1d(myArray["ZoneName"], criteriaList)
    Out[2]: array([False,  True,  True,  True,  True], dtype=bool)
    
    In [3]: myArray[(myArray["Flag1"] == 1) &
       ....:        (myArray["Flag2"] == 1) &
       ....:        np.in1d(myArray["ZoneName"], criteriaList)]["Value"].sum()
    Out[3]: 31.344999999999999