I want to evaluate each value in a 2D numpy float array if it falls within the min, max boundaries of a certain numerical class. Next, I want to reassign that value to the 'score' associated with that class.
E.g the class boundaries could be:
>>> class1 = (0, 1.5)
>>> class2 = (1.5, 2.5)
>>> class3 = (2.5, 3.5)
The class scores are:
>>> score1 = 0.75
>>> score2 = 0.50
>>> score3 = 0.25
Values outside any of the classes should default to e.g. 99.
I've tried the following, but run into a ValueError due to broadcasting.
>>> import numpy as np
>>> arr_f = (6-0)*np.random.random_sample((4,4)) + 0 # array of random floats
>>> def reclasser(x, classes, news):
>>> compare = [x >= min and x < max for (min, max) in classes]
>>> try:
>>> return news[compare.index(True)
>>> except Value Error:
>>> return 99.0
>>> v_func = np.vectorize(reclasser)
>>> out = v_func(arr_f, [class1, class2, class3], [score1, score2, score3])
ValueError: operands could not be broadcast together with shapes (4,4) (4,2) (4,)
Any suggestions on why this error occurs and how to remediate would be most appreciated. Also, if I'm entirely on the wrong path using vectorized functions, I'd also be happy to hear that.
Try to first make the code work without using np.vectorize
. The code above won't work even with a single float as first argument. You misspelled ValueError
; also it's not a good idea to use min
and max
as variable names (they are Python functions). A fixed version of reclasser
would be:
def reclasser(x, classes, news):
compare = [min(cls) < x < max(cls) for cls in classes]
try:
return news[compare.index(True)]
except ValueError:
return 99.0
That said, I think using the reclasser and np.vectorize
is unnecessarily complex. Instead, you could do something like:
# class -> score mapping as a dict
class_scores = {class1: score1, class2: score2, class3: score3}
# matrix of default scores
scores = 99 * np.ones(arr_f.shape)
for cls, score in class_scores.items():
# see which array values belong into current class
in_cls = np.logical_and(cls[0] < arr_f, arr_f < cls[1])
# update scores for current class
scores[np.where(in_cls)] = score
scores
will then be an array of scores corresponding to the original data array.