Search code examples
pythonalgorithmmultiset

Leanest way to compute term frequency without using the counter class on a bag ADT


I have some code that works well computing term frequency on a chosen list using the counter class import.

from collections import Counter

terms=['the', 'fox', 'the', 'quick', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

tf = Counter(terms)

print(tf)

The existing code works great but I am wondering what would be the leanest way to achieve the same result strictly using a bag/multiset ADT without the help of the python counter class.

I have spent several days experimenting with code and looking on other forums without much success.


Solution

  • You can use a single dictionary comprehension:

    terms=['the', 'fox', 'the', 'quick', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
    new_terms = {term:terms.count(term) for term in terms}
    

    Output:

    {'lazy': 1, 'over': 1, 'fox': 2, 'dog': 1, 'quick': 1, 'the': 3, 'jumps': 1}
    

    using the multiset:

    import itertools
    import multiset
    final_data = [multiset.Multiset(list(b)) for a, b in itertools.groupby(sorted(terms))]
    

    Output:

    [Multiset({'dog': 1}), Multiset({'fox': 2}), Multiset({'jumps': 1}), Multiset({'lazy': 1}), Multiset({'over': 1}), Multiset({'quick': 1}), Multiset({'the': 3})]