Search code examples
pythonpandasreindex

Custom comparison for Pandas series and Dictionary


I have a series with four categories A,B,C,D and their current value

s1 = pd.Series({"A": 0.2, "B": 0.3, "C": 0.3, "D": 0.9})

And a threshold against which I need to compare the categories,

threshold = {"custom": {"A, B": 0.6, "C": 0.3}, "default": 0.4}

But the threshold has two categories summed together: A & B And it has a "default" threshold to apply to each category that hasn't been specifically named.

I can't quite work out how to do this in a general way I can solve two separate sub problems, but not the whole problem.

  1. I can solve the problem with single categories and a default threshold.
  2. Or, I can solve combined categories, but not apply the default threshold.

** What I need to evaluate is:**

s1[A]+s1[B] < threshold["custom"]["A,B"] :: 0.2 + 0.3 < 0.6
s1[C] < threshold["custom"]["C"] :: 0.3 < 0.3
s1[D] < threshold["default"] :: 0.9 < 0.4

And return this Series:

# A,B   True
# C    False
# D    False

Here is what I've got for the subproblems

1. To apply the default threshold, I reindex and fillna with the default value:

aligned_threshold = (
    pd.Series(threshold.get("custom"))
    .reindex(s1.index)
    .fillna(threshold.get("default"))
)
# A    0.4
# B    0.4
# C    0.3
# D    0.4

then I can compare:

s1 < aligned_threshold
# A     True
# B     True
# C    False
# D    False
# dtype: bool

2. To combine categories

threshold_s = pd.Series(threshold.get("custom"))
s1_combined = pd.Series(index=threshold_s.index)
for category, threshold in threshold["custom"].items():
    s1_combined[category] = sum([s1.get(k, 0) for k in category.split(", ")])
# now s1_combined is:
# A,B    0.6
# C      0.3
s1_combined < threshold_s
# A,B     True
# C      False
# dtype: bool

but I've lost category D

To recap, what I need is:

s1[A]+s1[B]
s1[C]
s1[D]

So that I can compare thus: s1 < threshold

And return this Series:

# A,B   True
# C    False
# D    False

Solution

  • You could build a mapper to rename, then groupby.sum and compare to the reference thresholds:

    mapper = {x: k for k in threshold['custom']
              for x in k.split(', ')}
    # {'A': 'A, B', 'B': 'A, B', 'C': 'C'}
    
    s2 = (s1.rename(mapper)
            .groupby(level=0).sum()
         )
    
    out = s2.lt(s2.index.to_series()
                   .map(threshold['custom'])           
                   .fillna(threshold['default'])
               )
    

    Alternative for the last step if you don't have NaNs:

    out = s2.lt(s2.index.map(threshold['custom']).values,
                fill_value=threshold['default'])
    

    Output:

    A, B     True
    C       False
    D       False
    dtype: bool