Given a Counter, e.g.:
>>> from collections import Counter
>>> Counter('123112415121361273')
Counter({'1': 7, '2': 4, '3': 3, '5': 1, '4': 1, '7': 1, '6': 1})
How can I reverse the index and get the counts as keys and the values as lists of original string keys?
The purpose is to convert the example above into something like this:
defaultdict(<type 'list'>, {1: ['5', '4', '7', '6'], 3: ['3'], 4: ['2'], 7: ['1']})
I have tried tried manually reiterating through the Counter
:
>>> from collections import Counter
>>> Counter('123112415121361273')
Counter({'1': 7, '2': 4, '3': 3, '5': 1, '4': 1, '7': 1, '6': 1})
>>> x = Counter('123112415121361273')
>>> from collections import Counter, defaultdict
>>> y = defaultdict(list)
>>> for s, count in x.items():
... y[count].append(s)
...
>>> y
defaultdict(<type 'list'>, {1: ['5', '4', '7', '6'], 3: ['3'], 4: ['2'], 7: ['1']})
But is there any other way to do this?
Since the input is the string '123112415121361273'
and the output is supposed to be the dictionary indexed by counts, is there any way to avoid the counting step when iterating through it the first time and get to the resulting defaultdict?
No, there is no more efficient way.
Counting is best done with a mapping, which is exactly what Counter
does. Since you don't know the final count for any character until you fully traversed the string, you can't know up-front what bucket to file a character into until you have completed counting.
So the in-efficient alternative is to start with a mapping from count to characters, then move characters up to the next bucket as you find that they already have a count. Finding that they already have a count requires that you test against every bucket, so that becomes a O(NK) solution as opposed to your straightforward linear O(N) solution that the Counter
gives you.
## Warning: this is not an efficient approach; use for illustration purposes only
from collections import defaultdict
s = '123112415121361273'
count_to_char = defaultdict(set) # use a set to avoid O(N**2) performance
max_count = 0
for char in s: # loop over N items
for i in range(1, max_count + 1): # loop over up to K buckets
if char in count_to_char[i]:
count_to_char[i].remove(char)
count_to_char[i + 1].add(char)
break
else:
i = 0
count_to_char[1].add(char)
max_count = max(i + 1, max_count)
# remove empty buckets again
for count in [c for c, b in count_to_char.items() if not b]:
del count_to_char[count]
# alternative method to clear empty buckets, producing a regular dict
# count_to_char = {c: b for c, b in count_to_char.items() if b}
The way to avoid that scan over K buckets is to use a counter, which you already use.