I have the following data
[[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
I need the following output
[ABC, 2, 7]
[BCD, 4, 13]
[CDE, 1, 3]
[DEF, 1, 3]
I need to count the number of words as position [1] and sum the number for that word at position [0]. The result is
[Word, freq, sum of weight]
I check the finding frequencies of pair items in a list of pairs and Finding frequency distribution of a list of numbers in python but they could not solve my problem.
I tried this but no success
res = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
d = {}
for freq, label in res:
if label not in d:
d[label] = {}
inner_dict = d[label]
if freq not in inner_dict:
inner_dict[freq] = 0
inner_dict[freq] += freq
print(inner_dict)
With pandas:
import pandas
data = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
df = pandas.DataFrame(data, columns=['count', 'word'])
result = df.groupby('word')['count'].agg((len, sum))
Result:
len sum
word
ABC 2 7
BCD 4 13
CDE 1 3
DEF 1 3
To sort the result, use sort_values
:
result.sort_values(['sum', 'len'])
:
len sum
word
CDE 1 3
DEF 1 3
ABC 2 7
BCD 4 13