Search code examples
pythonarraysnumpymaxmasked-array

numpy 2d: How to get the index of the max element in the first column for only the allowed value in the second column


Help find a high-performance way to solve the problem: I have a result after neural-network(answers_weight), a category for answers(same len) and allowed categories for current request:

answers_weight = np.asarray([0.9, 3.8, 3, 0.6, 0.7, 0.99]) # ~3kk items
answers_category = [1, 2, 1, 5, 3, 1] # same size as answers_weight: ~3kk items
categories_allowed1 = [1, 5, 8]
res = np.stack((answers_weight, answers_category), axis=1)

I need to know the index(in answers_weight array) of max element, but skip not allowed categories(2,3).

In final, index must be = 2("3.0", because "3.8" must be skipped as not-allowed by category)


Solution

  • The easiest way would be to use numpy's masked_arrays to mask your weights according to allowed_categories and then find argmax:

    np.ma.masked_where(~np.isin(answers_category,categories_allowed1),answers_weight).argmax()
    #2
    

    Another way of doing it using masks (this one assumes unique max weight):

    mask = np.isin(answers_category, categories_allowed1)
    np.argwhere(answers_weight==answers_weight[mask].max())[0,0]
    #2