Let's say I have the following strings in Python3.x
string1 = 'AAAAABBBBCCCDD'
string2 = 'CCBADDDDDBACDC'
string3 = 'DABCBEDCCAEDBB'
I would like to create a summary "frequency string" that counts the number of characters in the string in the following format:
string1_freq = '5A4B3C2D' ## 5 A's, followed by 4 B's, 3 C's, and 2D's
string2_freq = '2C1B1A5D1B1A1C1D1C'
string3_freq = '1D1A1B1C1B1E1D2C1A1E1D2B'
My problem:
How would I quickly create such a summary string?
My idea would be: create an empty list to keep track of the count. Then create a for loop which checks the next character. If there's a match, increase the count by +1 and move to the next character. Otherwise, append to end of the string 'count' + 'character identity'.
That's very inefficient in Python. Is there a quicker way (maybe using the functions below)?
There are several ways to count the elements of a string in python. I like collections.Counter
, e.g.
from collections import Counter
counter_str1 = Counter(string1)
print(counter_str1['A']) # 5
print(counter_str1['B']) # 4
print(counter_str1['C']) # 3
print(counter_str1['D']) # 2
There's also str.count(sub[, start[, end]
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
As an example:
print(string1.count('A')) ## 5
The following code accomplishes the task without importing any modules.
def freq_map(s):
num = 0 # number of adjacent, identical characters
curr = s[0] # current character being processed
result = '' # result of function
for i in range(len(s)):
if s[i] == curr:
num += 1
else:
result += str(num) + curr
curr = s[i]
num = 1
result += str(num) + curr
return result
Note: Since you requested a solution based on performance, I suggest you use this code or a modified version of it.
I have executed rough performance test against the code provided by CoryKramer for reference. This code performed the same function in 58% of the time without using external modules. The snippet can be found here.