python string python-2.7 sorting nested-loops

Sorting a string to make a new one

Here I had to remove the most frequent alphabet of a string(if frequency of two alphabets is same, then in alphabetical order) and put it into new string.

Input:

abbcccdddd

Output:

dcdbcdabcd

The code I wrote is:

s = list(sorted(<the input string>))
a = []
for c in range(len(s)):
    freq =[0 for _ in range(26)]
    for x in s:
        freq[ord(x)-ord('a')] += 1
    m = max(freq)
    allindices = [p for p,q in enumerate(freq) if q == m]
    r = chr(97+allindices[0])
    a.append(r)
    s.remove(r)
print''.join(a)

But it passed the allowed runtime limit maybe due to too many loops.(There's another for loop which seperates the strings from user input)

I was hoping if someone could suggest a more optimised version of it using less memory space.

Solution

Your solution involves 26 linear scans of the string and a bunch of unnecessary conversions to count the frequencies. You can save some work by replacing all those linear scans with a linear count step, another linear repetition generation, then a sort to order your letters and a final linear pass to strip counts:

from collections import Counter      # For unsorted input
from itertools import groupby        # For already sorted input
from operator import itemgetter

def makenewstring(inp):
    # When inp not guaranteed to be sorted:
    counts = Counter(inp).iteritems()

    # Alternative if inp is guaranteed to be sorted:
    counts = ((let, len(list(g))) for let, g in groupby(inp))

    # Create appropriate number of repetitions of each letter tagged with a count
    # and sort to put each repetition of a letter in correct order
    # Use negative n's so much more common letters appear repeatedly at start, not end
    repeats = sorted((n, let) for let, cnt in counts for n in range(0, -cnt, -1))

    # Remove counts and join letters
    return ''.join(map(itemgetter(1), repeats))

Updated: It occurred to me that my original solution could be made much more concise, a one-liner actually (excluding required imports), that minimizes temporaries, in favor of a single sort-by-key operation that uses a trick to sort each letter by the count of that letter seen so far:

from collections import defaultdict
from itertools import count

def makenewstring(inp):
    return ''.join(sorted(inp, key=lambda c, d=defaultdict(count): (-next(d[c]), c)))

This is actually the same basic logic as the original answer, it just accomplishes it by having sorted perform the decoration and undecoration of the values implicitly instead of doing it ourselves explicitly (implicit decorate/undecorate is the whole point of sorted's key argument; it's doing the Schwartzian transform for you).

Performance-wise, both approaches are similar; they both (in practice) scale linearly for smaller inputs (the one-liner up to inputs around 150 characters long, the longer code, using Counter, up to inputs in the len 2000 range), and while the growth is super-linear above that point, it's always below the theoretical O(n log_2 n) (likely due to the data being not entirely random thanks to the counts and limited alphabet, ensuring Python's TimSort has some existing ordering to take advantage of). The one-liner is somewhat faster for smaller strings (len 100 or less), the longer code is somewhat faster for larger strings (I'm guessing it has something to do with the longer code creating some ordering by grouping runs of counts for each letter). Really though, it hardly matters unless the input strings are expected to be huge.