Search code examples
pythonalgorithm

In a long string of a small finite number of unique letters, find the longest substring that contains all unique letters repeated no more than k times


I've read through the similar questions and cannot find any already posted question quite like mine.

Problem: In a long string consisting of a small number of letters (alphabet of this string is relatively small), find the longest possible substring (keeping the order as is) that contains at most k repetitions of each letter.

Example: Input string xyzxyxxxxxzzz; k=2. Here we have a string from the alphabet {x, y, z}. Longest possible substrings with no letter repeating more than 2 times would be xyzxy or yzxyx.

Another example: abbbbcccddddabcd; k=1. Solutions could be any of dabc, abcd.

I can't quite figure out how to formalize the solution into code. What I'm thinking is an approach where:

  • the string is read from left to right and the count of each occurrence of each unique letter from the alphabet is kept in let's say a Python dict
  • if any of the letters repeats more than k times, cut the string before the offensive letter, keep it in an array of possible solutions
  • continue reading the string to the end and keep any possible solution
  • do the same for the sub-string of the original string minus the first letter and call recursively until the string is exhausted or you have less string left than unique letters in the alphabet

Somehow I think this would be too complicated and there must be an easier solution which is obvious but for some reason I can't see it.


Solution

  • Your approach is not far from an efficient solution. This works best with a sliding window algorithm where that window can grow at the right and shrink from the left. So your second step would need a change

    Instead of:

    if any of the letters repeats more than k times, cut the string before the offensive letter, keep it in an array of possible solutions

    Do:

    if any of the letters repeats more than k times, cut the string from the left side up to the first occurrence of that offending letter.

    No recursion is needed. At each step where you add a character to the right of the current window, check if it is longer than the best one you had so far.

    Implementation:

    from collections  import defaultdict
    
    def solve(s, k):
        freq = defaultdict(int)
        left = 0
        bestLeft = 0
        bestRight = -1
        for right, ch in enumerate(s):
            if freq[ch] < k:
                freq[ch] += 1
            else:
                while s[left] != ch:
                    freq[s[left]] -= 1
                    left += 1
                left += 1
            if right - left > bestRight - bestLeft:
                bestLeft, bestRight = left, right
        return s[bestLeft: bestRight+1]
        
    s = "xyzxyxxxxxzzz"
    k = 2
    print(solve(s, k))  # xyzxy
    
    s = "abbbbcccddddabcd"
    k = 1
    print(solve(s, k))  # dabc