Search code examples
pythonsetcounter

Frequency or Count of a list of sets in Python


I have a large dataset of flight legs that I wanna build a graph out of where the weight of the graph is the number of times a particular leg was flown. The pairs of cities involved in a leg is stored as a list of sets. I am having trouble creating a count/frequency dictionary because "sets are unhashable"

my_test_list = [{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'DOHA', 'JAKARTA'},{'DOHA', 'ROME'},{'MAURITIUS','ROME'},{'MAURITIUS', 'ROME'},{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'JAKARTA', 'ROME'}, {'DOHA', 'ROME'},{'NEW YORK   NY', 'WASHINGTON, DC'},{'ACCRA', 'WASHINGTON, DC'}]

Ideally, I would like to have an output like this that I can feed into networkx:

edge_list = [('DOHA', 'ROME', {'frequency': 4}), ('DOHA', 'JAKARTA', {'frequency': 3}),('MAURITIUS', 'ROME', {'frequency': 2}), ('ROME', 'JAKARTA', {'frequency': 1}),('NEW YORK   NY', 'WASHINGTON, DC', {'frequency': 1}),('ACCRA', 'WASHINGTON, DC', {'frequency': 1}) ]

This is what I have done and it seems ghastly.

my_concat_list=[]
for item in my_test_list:
    out=""
    while len(item) !=0:
        out=out+";"+item.pop()
    my_concat_list.append(out)

my_concat_list winds up looking like this:

 [';DOHA;ROME',
 ';JAKARTA;DOHA',
 ';JAKARTA;DOHA',
 ';DOHA;ROME',
 ';ROME;MAURITIUS',
 ';ROME;MAURITIUS',
 ';DOHA;ROME',
 ';JAKARTA;DOHA',
 ';JAKARTA;ROME',
 ';DOHA;ROME',
 ';WASHINGTON, DC;NEW YORK   NY',
 ';ACCRA;WASHINGTON, DC']

I use Counter to get the frequency.

from collections import Counter
my_out = Counter(my_concat_list)

The output I get is:

Counter({';DOHA;ROME': 4,
         ';JAKARTA;DOHA': 3,
         ';ROME;MAURITIUS': 2,
         ';JAKARTA;ROME': 1,
         ';WASHINGTON, DC;NEW YORK   NY': 1,
         ';ACCRA;WASHINGTON, DC': 1})

From here, I can get the final format I want:

my_final_list=[]
for item in my_out.keys():
    temp_list = item.split(";")
    weight = my_out[item]
    my_new_tuple = (temp_list[1],temp_list[2],{'frequency':weight})
    my_final_list.append(my_new_tuple)
my_final_list

This is what my_final_list looks like:

[('DOHA', 'ROME', {'frequency': 4}),
 ('JAKARTA', 'DOHA', {'frequency': 3}),
 ('ROME', 'MAURITIUS', {'frequency': 2}),
 ('JAKARTA', 'ROME', {'frequency': 1}),
 ('WASHINGTON, DC', 'NEW YORK   NY', {'frequency': 1}),
 ('ACCRA', 'WASHINGTON, DC', {'frequency': 1})]

But there's got to be a better way of doing this. This seems really clunky.


Solution

  • If you convert the sets into tuples, you can then use a Counter directly on the input data. You can then use a list comprehension to convert the Counter into the format you desire:

    from collections import Counter
    
    my_test_list = [{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'DOHA', 'JAKARTA'},{'DOHA', 'ROME'},{'MAURITIUS','ROME'},{'MAURITIUS', 'ROME'},{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'JAKARTA', 'ROME'}, {'DOHA', 'ROME'},{'NEW YORK   NY', 'WASHINGTON, DC'},{'ACCRA', 'WASHINGTON, DC'}]
    
    counts = Counter(tuple(s) for s in my_test_list)
    
    result = [k + ({ 'frequency' : v },) for k, v in counts.items()]
    print(result)
    

    Output:

    [
     ('DOHA', 'ROME', {'frequency': 4}),
     ('DOHA', 'JAKARTA', {'frequency': 3}),
     ('ROME', 'MAURITIUS', {'frequency': 2}),
     ('ROME', 'JAKARTA', {'frequency': 1}),
     ('WASHINGTON, DC', 'NEW YORK   NY', {'frequency': 1}),
     ('WASHINGTON, DC', 'ACCRA', {'frequency': 1})
    ]