I've read through the similar questions and cannot find any already posted question quite like mine.
Problem: In a long string consisting of a small number of letters (alphabet of this string is relatively small), find the longest possible substring (keeping the order as is) that contains at most k repetitions of each letter.
Example: Input string xyzxyxxxxxzzz; k=2. Here we have a string from the alphabet {x, y, z}. Longest possible substrings with no letter repeating more than 2 times would be xyzxy or yzxyx.
Another example: abbbbcccddddabcd; k=1. Solutions could be any of dabc, abcd.
I can't quite figure out how to formalize the solution into code. What I'm thinking is an approach where:
Somehow I think this would be too complicated and there must be an easier solution which is obvious but for some reason I can't see it.
Your approach is not far from an efficient solution. This works best with a sliding window algorithm where that window can grow at the right and shrink from the left. So your second step would need a change
Instead of:
if any of the letters repeats more than k times, cut the string before the offensive letter, keep it in an array of possible solutions
Do:
if any of the letters repeats more than k times, cut the string from the left side up to the first occurrence of that offending letter.
No recursion is needed. At each step where you add a character to the right of the current window, check if it is longer than the best one you had so far.
Implementation:
from collections import defaultdict
def solve(s, k):
freq = defaultdict(int)
left = 0
bestLeft = 0
bestRight = -1
for right, ch in enumerate(s):
if freq[ch] < k:
freq[ch] += 1
else:
while s[left] != ch:
freq[s[left]] -= 1
left += 1
left += 1
if right - left > bestRight - bestLeft:
bestLeft, bestRight = left, right
return s[bestLeft: bestRight+1]
s = "xyzxyxxxxxzzz"
k = 2
print(solve(s, k)) # xyzxy
s = "abbbbcccddddabcd"
k = 1
print(solve(s, k)) # dabc