Search code examples
pythonlistfrequency

How to count the frequency of words and add the associated weight of the words in a list of lists


I have the following data

[[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]

I need the following output

[ABC, 2, 7]
[BCD, 4, 13]
[CDE, 1, 3]
[DEF, 1, 3]

I need to count the number of words as position [1] and sum the number for that word at position [0]. The result is

[Word, freq, sum of weight] 

I check the finding frequencies of pair items in a list of pairs and Finding frequency distribution of a list of numbers in python but they could not solve my problem.

I tried this but no success

res = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
 d = {}
for freq, label in res:
    if label not in d:
        d[label] = {}
    inner_dict = d[label]
    if freq not in inner_dict:
        inner_dict[freq] = 0
    inner_dict[freq] += freq

print(inner_dict)

Solution

  • With pandas:

    import pandas
    data = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
    df = pandas.DataFrame(data, columns=['count', 'word'])
    result = df.groupby('word')['count'].agg((len, sum))
    

    Result:

           len sum
    word
    ABC      2   7
    BCD      4  13
    CDE      1   3
    DEF      1   3
    

    To sort the result, use sort_values:

    result.sort_values(['sum', 'len']):

          len  sum
    word
    CDE     1    3
    DEF     1    3
    ABC     2    7
    BCD     4   13