I am trying to implement a function which checks whether a counter contains "similar" percentage of each items. That is
from collections import Counter
c = Counter(["Dog", "Cat", "Dog", "Horse", "Dog"])
size = 5
lst = list(c.values())
percentages = [x / size * 100 for x in lst] # [60.0, 20.0, 20.0]
How can I check whether those percentages
are all "similar"? I would like to apply the math.isclose
method with abs_tol=2
but it takes two arguments not the entire list.
In the example, items do not occurs similarly.
This method will be used for checking whether a training set of labels is balanced or not.
One way is to pick the minimum and maximum value of the percentages list and pass those to isclose()
from math import isclose
from collections import Counter
def is_balanced(lst, abs_tol):
c = Counter(lst)
total = c.total()
percentages = [(v / total) * 100 for v in c.values()]
return isclose(min(percentages), max(percentages), abs_tol=abs_tol)
lst1 = ["Dog", "Cat", "Dog", "Horse", "Dog"]
lst2 = ["Dog", "Cat", "Horse"]
print(is_balanced(lst1, 2)) # False
print(is_balanced(lst2, 2)) # True