I have a series with four categories A,B,C,D and their current value
s1 = pd.Series({"A": 0.2, "B": 0.3, "C": 0.3, "D": 0.9})
And a threshold against which I need to compare the categories,
threshold = {"custom": {"A, B": 0.6, "C": 0.3}, "default": 0.4}
But the threshold has two categories summed together: A & B
And it has a "default"
threshold to apply to each category that hasn't been specifically named.
I can't quite work out how to do this in a general way I can solve two separate sub problems, but not the whole problem.
** What I need to evaluate is:**
s1[A]+s1[B] < threshold["custom"]["A,B"] :: 0.2 + 0.3 < 0.6
s1[C] < threshold["custom"]["C"] :: 0.3 < 0.3
s1[D] < threshold["default"] :: 0.9 < 0.4
And return this Series:
# A,B True
# C False
# D False
Here is what I've got for the subproblems
1. To apply the default threshold, I reindex and fillna with the default value:
aligned_threshold = (
pd.Series(threshold.get("custom"))
.reindex(s1.index)
.fillna(threshold.get("default"))
)
# A 0.4
# B 0.4
# C 0.3
# D 0.4
then I can compare:
s1 < aligned_threshold
# A True
# B True
# C False
# D False
# dtype: bool
2. To combine categories
threshold_s = pd.Series(threshold.get("custom"))
s1_combined = pd.Series(index=threshold_s.index)
for category, threshold in threshold["custom"].items():
s1_combined[category] = sum([s1.get(k, 0) for k in category.split(", ")])
# now s1_combined is:
# A,B 0.6
# C 0.3
s1_combined < threshold_s
# A,B True
# C False
# dtype: bool
but I've lost category D
To recap, what I need is:
s1[A]+s1[B]
s1[C]
s1[D]
So that I can compare thus:
s1 < threshold
And return this Series:
# A,B True
# C False
# D False
You could build a mapper to rename
, then groupby.sum
and compare to the reference thresholds:
mapper = {x: k for k in threshold['custom']
for x in k.split(', ')}
# {'A': 'A, B', 'B': 'A, B', 'C': 'C'}
s2 = (s1.rename(mapper)
.groupby(level=0).sum()
)
out = s2.lt(s2.index.to_series()
.map(threshold['custom'])
.fillna(threshold['default'])
)
Alternative for the last step if you don't have NaNs:
out = s2.lt(s2.index.map(threshold['custom']).values,
fill_value=threshold['default'])
Output:
A, B True
C False
D False
dtype: bool