Search code examples
pythonpython-3.xstringdictionarysubstring

find the Longest Substring Without Repeating Characters. My code works for half of the test cases. how do i change it for it work everytime


This is a leetcode question : Given a string s, find the length of the longest substring without repeating characters.
xample 1:

Input: s = "abcabcbb" Output: 3 Explanation: The answer is "abc", with the length of 3. Example 2:

Input: s = "pwwkew" Output: 3 Explanation: The answer is "wke", with the length of 3. Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.
My code failed when it encountered this string : s = "dvdf"
I know what the problem is but I just can't code it to make it work. I know that my program will not star over from "v" after the char gets reset and will start from "d" but i dont know hot to make it do that. This is my code :

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        substring_length = {}
        i = 0
        count = 0
        char = ""
        while i != len(s) :
            if s[i] not in char:
                char = char + s[i]
                count += 1
                i += 1
                if i == len(s) :
                    substring_length[char] = count
            elif s[i] in char:
                substring_length[char] = count
                char = ""
                count = 0
            
        if len(substring_length) == 0:
            substring_length[s] = len(s)
        return max(substring_length.values())

I know there are other solutions to my problem, but I'm sure mine would work if I can just find how to solve this issue.


Solution

  • You effectively answered your own question. Your method requires you to double back each time you hit a duplicate. But you are not decrementing the i to do that before resetting the count. You need to scan back into the string and start again from just after where the run started.

                elif s[i] in char:
                    substring_length[char] = count
                    char = ""
                    i -= (count - 1)
                    count = 0
    

    Some comments:

    • the elif line could just be else since it is the exact negation of the if condition.
    • As you say, there are other solutions to the problem, and if you avoid the backtracking then you'll get a faster result.

    (Edited to discuss efficiency - not enough space or formatting capability in the comment section)

    But isn't my program still O(n). so it would be as fast as the sliding window approach no?

    That depends on what we are counting! The in operator could be implemented in constant time with reasonable memory and the knowledge that we aren't going to store duplicate characters. So we'll presume we already did that, ignore the real cost of that operator as implemented now, and just focus on how many times we need to call it. This gives a reasonable approximation to the complexity of the algorithm.

    (I'm removing your redundant check in the elif clause for the purposes of the below, and calling this Alg.1)

    Your algorithm calls in for each character as it works its way forward through the string, but also repeats those checks when it needs to backtrack. The backtracking adds a variable amount of operations depending on the string in question.

    Example: abcabcbb requires your algoritm to check the presence of a character in the string 25 times; and pwwkew requires 12 checks.

    There was briefly another algorithm posted as an answer which seems to have disappeared. It naively checks all substrings that start at each position until it finds a duplicate. So it walks the strings a bit less efficiently. Let's call that Alg.2.

    By contrast, the below algorithm performs the in operation once for each character in the input, and once more for each character between the start of the current longest substring and the repeated character.

        def lengthOfLongestSubstring(self, s: str) -> int:
            currentSubstring = ""
            longestSubstring = s[0]
            for i in s:
                if len(longestSubstring) < len(currentSubstring):
                    longestSubstring = currentSubstring
                while i in currentSubstring:
                    currentSubstring = currentSubstring[1:]
                currentSubstring += i
            if len(longestSubstring) < len(currentSubstring):
                longestSubstring = currentSubstring
            return len(longestSubstring), longestSubstring
    

    If we generate many (I did 10000) random strings of 27 characters (first guarantee of a duplicate) and 1000 characters (stress any inefficiencies), then count the calls to the in operator for each algorithm we get an average operation count of:

    Alg.#  Ops for 27 chars  Ops for 1000 chars  Ops for 'a..za'  Ops for abab..aba  Ops for aabb..zzaa
    Alg.1      145               7032                 53              77                 132
    Alg.2      165               7053                378              78                 132
    Alg.3       48               1994                 28              52                 105
    

    Also included the number of operations for scenarios designed to be bad for each algorithm.