In a long string of a small finite number of unique letters, find the longest substring that contains all unique letters repeated no more than k times

I've read through the similar questions and cannot find any already posted question quite like mine.

Problem: In a long string consisting of a small number of letters (alphabet of this string is relatively small), find the longest possible substring (keeping the order as is) that contains at most k repetitions of each letter.

Example: Input string xyzxyxxxxxzzz; k=2. Here we have a string from the alphabet {x, y, z}. Longest possible substrings with no letter repeating more than 2 times would be xyzxy or yzxyx.

Another example: abbbbcccddddabcd; k=1. Solutions could be any of dabc, abcd.

I can't quite figure out how to formalize the solution into code. What I'm thinking is an approach where:

the string is read from left to right and the count of each occurrence of each unique letter from the alphabet is kept in let's say a Python dict
if any of the letters repeats more than k times, cut the string before the offensive letter, keep it in an array of possible solutions
continue reading the string to the end and keep any possible solution
do the same for the sub-string of the original string minus the first letter and call recursively until the string is exhausted or you have less string left than unique letters in the alphabet

Somehow I think this would be too complicated and there must be an easier solution which is obvious but for some reason I can't see it.

Solution

Your approach is not far from an efficient solution. This works best with a sliding window algorithm where that window can grow at the right and shrink from the left. So your second step would need a change

Instead of:

if any of the letters repeats more than k times, cut the string before the offensive letter, keep it in an array of possible solutions

Do:

if any of the letters repeats more than k times, cut the string from the left side up to the first occurrence of that offending letter.

No recursion is needed. At each step where you add a character to the right of the current window, check if it is longer than the best one you had so far.

Implementation:

from collections  import defaultdict

def solve(s, k):
    freq = defaultdict(int)
    left = 0
    bestLeft = 0
    bestRight = -1
    for right, ch in enumerate(s):
        if freq[ch] < k:
            freq[ch] += 1
        else:
            while s[left] != ch:
                freq[s[left]] -= 1
                left += 1
            left += 1
        if right - left > bestRight - bestLeft:
            bestLeft, bestRight = left, right
    return s[bestLeft: bestRight+1]
    
s = "xyzxyxxxxxzzz"
k = 2
print(solve(s, k))  # xyzxy

s = "abbbbcccddddabcd"
k = 1
print(solve(s, k))  # dabc