I have some code that works well computing term frequency on a chosen list using the counter class import.
from collections import Counter
terms=['the', 'fox', 'the', 'quick', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
tf = Counter(terms)
print(tf)
The existing code works great but I am wondering what would be the leanest way to achieve the same result strictly using a bag/multiset ADT without the help of the python counter class.
I have spent several days experimenting with code and looking on other forums without much success.
You can use a single dictionary comprehension:
terms=['the', 'fox', 'the', 'quick', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
new_terms = {term:terms.count(term) for term in terms}
Output:
{'lazy': 1, 'over': 1, 'fox': 2, 'dog': 1, 'quick': 1, 'the': 3, 'jumps': 1}
using the multiset
:
import itertools
import multiset
final_data = [multiset.Multiset(list(b)) for a, b in itertools.groupby(sorted(terms))]
Output:
[Multiset({'dog': 1}), Multiset({'fox': 2}), Multiset({'jumps': 1}), Multiset({'lazy': 1}), Multiset({'over': 1}), Multiset({'quick': 1}), Multiset({'the': 3})]