Search code examples
pythonstringpython-3.xfrequency

Python strings: quickly summarize the character count in order of appearance


Let's say I have the following strings in Python3.x

string1 = 'AAAAABBBBCCCDD'
string2 = 'CCBADDDDDBACDC'
string3 = 'DABCBEDCCAEDBB'

I would like to create a summary "frequency string" that counts the number of characters in the string in the following format:

string1_freq = '5A4B3C2D'  ## 5 A's, followed by 4 B's, 3 C's, and 2D's
string2_freq = '2C1B1A5D1B1A1C1D1C' 
string3_freq = '1D1A1B1C1B1E1D2C1A1E1D2B' 

My problem:

How would I quickly create such a summary string?

My idea would be: create an empty list to keep track of the count. Then create a for loop which checks the next character. If there's a match, increase the count by +1 and move to the next character. Otherwise, append to end of the string 'count' + 'character identity'.

That's very inefficient in Python. Is there a quicker way (maybe using the functions below)?

There are several ways to count the elements of a string in python. I like collections.Counter, e.g.

from collections import Counter
counter_str1 = Counter(string1)
print(counter_str1['A']) # 5
print(counter_str1['B']) # 4
print(counter_str1['C']) # 3
print(counter_str1['D']) # 2

There's also str.count(sub[, start[, end]

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

As an example:

print(string1.count('A'))  ## 5

Solution

  • The following code accomplishes the task without importing any modules.

    def freq_map(s):
        num = 0         # number of adjacent, identical characters
        curr = s[0]     # current character being processed
        result = ''     # result of function
    
        for i in range(len(s)):
            if s[i] == curr:
                num += 1
            else:
                result += str(num) + curr
                curr = s[i]
                num = 1
    
        result += str(num) + curr
    
        return result
    

    Note: Since you requested a solution based on performance, I suggest you use this code or a modified version of it.

    I have executed rough performance test against the code provided by CoryKramer for reference. This code performed the same function in 58% of the time without using external modules. The snippet can be found here.