Search code examples
pythonstringfor-loopalphabetical

Find Longest Alphabetically Ordered Substring - Efficiently


The goal of some a piece of code I wrote is to find the longest alphabetically ordered substring within a string.

"""
Find longest alphabetically ordered substring in string s.
"""

s = 'zabcabcd' # Test string.

alphabetical_str, temp_str = s[0], s[0]
for i in range(len(s) - 1):  # Loop through string.
    if s[i] <= s[i + 1]:  # Check if next character is alphabetically next.
        temp_str += s[i + 1]  # Add character to temporary string.
        if len(temp_str) > len(alphabetical_str):  # Check is temporary string is the longest string.
            alphabetical_str = temp_str  # Assign longest string.
    else:
        temp_str = s[i + 1]  # Assign last checked character to temporary string.
print(alphabetical_str)

I get an output of abcd.

But the instructor says there is PEP 8 compliant way of writing this code that is 7-8 lines of code and there is a more computational efficient way of writing this code that is ~16 lines. Also that there is a way of writing this code in only 1 line 75 character!

Can anyone provide some insight on what the code would look like if it was 7-8 lines or what the most work appropriate way of writing this code would be? Also any PEP 8 compliance critique would be appreciated.


Solution

  • Depending on how you choose to count, this is only 6-7 lines and PEP 8 compliant:

    def longest_alphabetical_substring(s):
        sub = '', 0
        for i in range(len(s)):
            j = i + len(sub) + 1
            while list(s[i:j]) == sorted(s[i:j]) and j <= len(s):
                sub, j = s[i:j], j+1
        return sub
    
    
    print(longest_alphabetical_substring('zabcabcd'))
    

    Your own code was PEP 8 compliant as far as I can tell, although it would make sense to capture code like this in a function, for easy reuse and logical grouping for improved readability.

    The solution I provided here is not very efficient, as it keeps extracting copies of the best result so far. A slightly longer solution that avoids this:

    def longest_alphabetical_substring(s):
        n = m = 0
        for i in range(len(s)):
            for j in range(i+1, len(s)+1):
                if j == len(s) or s[j] < s[j-1]:
                    if j-i > m-n:
                        n, m = i, j
                    break
        return s[n:m]
    
    
    print(longest_alphabetical_substring('zabcabcd'))
    

    There may be more efficient ways of doing this; for example you could detect that there's no need to keep looking because there is not enough room left in the string to find longer strings, and exit the outer loop sooner.

    User @kellybundy is correct, a truly efficient solution would be linear in time. Something like:

    def las_efficient(s):
        t = s[0]
        return max([(t := c) if c < t[-1] else (t := t + c) for c in s[1:]], key=len)
    
    
    print(las_efficient('zabcabcd'))
    

    No points for readability here, but PEP 8 otherwise, and very brief.

    And for an even more efficient solution:

    def las_very_efficient(s):
        m, lm, t, ls = '', 0, s[0], len(s)
        for n, c in enumerate(s[1:]):
            if c < t[-1]:
                t = c
            else:
                t += c
                if len(t) > lm:
                    m, lm = t, len(t)
                if n + lm > ls:
                    break
        return m