Find the length of longest substring with at most K repeating characters

PROBLEM: Given integers n, k (1 <= n <= 100.000), (1 <= k <= n). Find the length of the longest substring with at most K repeating characters and find position of the first element in the substring from the original string. EXAMPLES
input: 6, 2 'abbbaa' output: 4 3 //here, 4 is max length and 3 is the position of the first element of substring 'bbaa' in the original string. Thus, each character repeats k times here, so it`s OK

input: 3, 1 'abb' output: 2, 1 //yeah, the required substring is 'ab' I hope the statement is clear, at least I tried, `cause English is not my native language.

Don`t know how to proceed in designing the algorithm for this problem. I should use binary search the answer technique, as it seems from the statement of the task.

First things first: I tried at least to develop my thoughts on my way of tackling this problem, so it is:

Find the middle, and we cut the string in two slices from 0 to mid+1 and mid to n respectively. (I code in Python, so it`s very natural to use slices here, I think)
Look for the number of unique characters (I used len(set(slice))) and compare: if the number of 1-st (left) slice is bigger then we change the right = mid, else left
All this instructions are put in the 'while right > left + 1' loop.

So far so good, but probably I missed sth or this idea isn`t an efficient one to get the position of the first element of the substring looking at its position in given string.

Code (Python 3): to find the position of the first element, using binsearch

n, k = map(int, input().split())
s = input()
left = 0
right = n - 1

while (right > left + 1):
    mid = (left + right) // 2
    char1 = len(set(s[0:mid+1]))
    char2 = len(set(s[mid:n]))
    if char1 >= char2:
        right = mid
    else:
        left = mid
print(left + 1)

Code to find the length of the longest substring without bin.search (I thought it will be easier, and it works fine, though naive approach)

n, k = map(int, input().split())
s = input()
letters = [0] * 26
max_len = 0
for i in range(len(s)):
    letters[ord(s[i]) - ord('a')] = letters[ord(s[i]) - ord('a')] + 1
for i in range(0, 26):
    if letters[i] >= k:
        max_len = max_len + k
    else:
        max_len = max_len + letters[i]
print(max_len)

Solution

Your approach is greedy and won’t get right results. Consider input 4 4 ccab - your binary search code will return 2, but it should return 1 (whole string is good). To solve this problem you can use method called two pointers. We define two pointers a and b as the beginning and end of substring that we are currently considering. We also need to remember how many occurences of each letter there are in current substring. First, set a and b to 0. Then try to expand current substring by moving its end by 1 (b=b+1) and as we do that increment the number of occurrences of letter s[b] by 1. If this number goes beyond K this means that our current string is no longer valid. To make it valid again we need to move it's beginning to the point where all letters of substring (a, b) occur less than or exactly K times. To do this we simply increment a by 1 and decrement oc[s[a]] until oc[s[b]] <= k. Time complexity is linear, because we increment b exactly n times and a at most n times. At each step we also check if b-a+1>max_len if so set max_len to b-a+1 and the beginning to a.

n, k = map(int, input().split())
s = input()
a = 0
oc = [0]*26
max_len = 0
first = -1

for b in range(0, n):
    oc[ord(s[b])-ord('a')]+=1
    while(oc[ord(s[b])-ord('a')]>k):
        oc[ord(s[a])-ord('a')]-=1
        a+=1
    if b-a+1>max_len:
        max_len=b-a+1
        first=a+1

print(first)
print(max_len)

oc is array where we store occurences of letters in substring.