Search code examples
pythoncounter

Counter of lists of lists


I have a big dictionary coming from a simulation loop that looks something like this:

my_dict = {
    'a': {
        1: [[1,2,3], [1,2,3], [1,2,3], [1,3,5]],
        2: [[2,44,57,18], [2,44,57,18], [2,44,57,23], [2,44,57,23]]},
    'b': {
        3: [[3,67,50], [3,67,50], [3,36]],
        4: [[4,12,34], [4,12]]}}

The structure is itself odd but I couldn't figure any other way to store it in my loop. My final goal is to obtain the proportion of lists that are the same for every letter key (a,b) for every element. That is, I want this (in any format, not necessary dictionary):

table of desired outputs

Importantly, I don't care about comparisons within list elements. I need to compare whether the full list appears multiple times. Within each least there are not repeated elements. Counter does not operate at the list level and, if I transform lists to strings, I can't back up them later (i.e. "123" --> [1,2,3] or [1,23]).

I also tried moving to a pandas dataframe and exploding the columns but then count() does not work either...

Also importantly, I do care about efficiency as there are on the order of 700k lists.


Solution

  • You can convert the lists into tuples before calling Counter:

    from collections import Counter
    
    summary = []
    for name1, sub_dict in my_dict.items():
        for ind, lists in sub_dict.items():
            C = Counter(map(tuple, lists))
            total = sum(C.values())
            for arr, freq in C.items():
                summary.append([name1, ind, list(arr), freq, total])
    
    for row in summary:
        print(row)
    
    ['a', 1, [1, 2, 3], 3, 4]
    ['a', 1, [1, 3, 5], 1, 4]
    ['a', 2, [2, 44, 57, 18], 2, 4]
    ['a', 2, [2, 44, 57, 23], 2, 4]
    ['b', 3, [3, 67, 50], 2, 3]
    ['b', 3, [3, 36], 1, 3]
    ['b', 4, [4, 12, 34], 1, 2]
    ['b', 4, [4, 12], 1, 2]