Search code examples
pythoncollectionspython-collections

collections.Counter: most_common INCLUDING equal counts


In collections.Counter, the method most_common(n) returns only the n most frequent items in a list. I need exactly that but I need to include the equal counts as well.

from collections import Counter
test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])
-->Counter({'A': 3, 'C': 2, 'B': 2, 'D': 2, 'E': 1, 'G': 1, 'F': 1, 'H': 1})
test.most_common(2)
-->[('A', 3), ('C', 2)

I would need [('A', 3), ('B', 2), ('C', 2), ('D', 2)] since they have the same count as n=2 for this case. My real data is on DNA code and could be quite large. I need it to be somewhat efficient.


Solution

  • You can do something like this:

    from itertools import takewhile
    
    def get_items_upto_count(dct, n):
      data = dct.most_common()
      val = data[n-1][1] #get the value of n-1th item
      #Now collect all items whose value is greater than or equal to `val`.
      return list(takewhile(lambda x: x[1] >= val, data))
    
    test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])
    
    print get_items_upto_count(test, 2)
    #[('A', 3), ('C', 2), ('B', 2), ('D', 2)]