I have a big dictionary coming from a simulation loop that looks something like this:
my_dict = {
'a': {
1: [[1,2,3], [1,2,3], [1,2,3], [1,3,5]],
2: [[2,44,57,18], [2,44,57,18], [2,44,57,23], [2,44,57,23]]},
'b': {
3: [[3,67,50], [3,67,50], [3,36]],
4: [[4,12,34], [4,12]]}}
The structure is itself odd but I couldn't figure any other way to store it in my loop. My final goal is to obtain the proportion of lists that are the same for every letter key (a,b) for every element. That is, I want this (in any format, not necessary dictionary):
Importantly, I don't care about comparisons within list elements. I need to compare whether the full list appears multiple times. Within each least there are not repeated elements. Counter
does not operate at the list level and, if I transform lists to strings, I can't back up them later (i.e. "123"
--> [1,2,3]
or [1,23]
).
I also tried moving to a pandas dataframe and exploding the columns but then count() does not work either...
Also importantly, I do care about efficiency as there are on the order of 700k lists.
You can convert the lists into tuples
before calling Counter
:
from collections import Counter
summary = []
for name1, sub_dict in my_dict.items():
for ind, lists in sub_dict.items():
C = Counter(map(tuple, lists))
total = sum(C.values())
for arr, freq in C.items():
summary.append([name1, ind, list(arr), freq, total])
for row in summary:
print(row)
['a', 1, [1, 2, 3], 3, 4]
['a', 1, [1, 3, 5], 1, 4]
['a', 2, [2, 44, 57, 18], 2, 4]
['a', 2, [2, 44, 57, 23], 2, 4]
['b', 3, [3, 67, 50], 2, 3]
['b', 3, [3, 36], 1, 3]
['b', 4, [4, 12, 34], 1, 2]
['b', 4, [4, 12], 1, 2]