Search code examples
pythonstringpython-3.xsubstringpalindrome

What's the time complexity of this algorithm finding the longest palindromic substring?


Here's the Python code:

def is_palindrome(s):
    return s == s[::-1]


def longestp(s):
    if is_palindrome(s):
        return s

    maxp = s[0]

    for i in range(len(s)-1):
        half_length = len(maxp) // 2
        start = i - half_length
        end = i + half_length

        while start >= 0 and end <= len(s)-1:
            if is_palindrome(s[start:end+2]):
                if len(s[start:end+2]) > len(maxp):
                    maxp = s[start:end+2]
                end += 1
            elif is_palindrome(s[start:end+1]):
                if len(s[start:end+1]) > len(maxp):
                    maxp = s[start:end+1]
                start -= 1
                end += 1
            else:
                break

    return maxp

I initially thought it was O(n^3) because of two nested loops and string slicing, but it turned out to be nearly linear in my tests. Is there any kind of input for which this algorithm is going to be slow?


Solution

  • The algorithm looks as if it needs total time proportional to

    integral_0^N x dx = [(x^2)/2]_0^N = (N^2)/2 = O(N^2)
    

    The strings matching ab* should give the worst case behavior.

    Here is a piece of code that kind-of demonstrates the worst case behavior experimentally.

    The structure is as follows:

    1. Define worstCase function that constructs "bad" strings of length N
    2. Measure time of your function on these strings
    3. Create dataset of log(N) vs. log(time(N))
    4. Fit a line, try to estimate the slope of the line: this is the exponent p in your O(N^p).

    Here is the code:

    def worstCase(length):
      return "a" + "b" * (length - 1)
    
    from time import clock
    from math import log
    
    xs = []
    ys = []
    for n in [4 * int(1000 * 1.2 ** n) for n in range(1, 20)]:
      s = worstCase(n)
      assert len(s) == n
      startTime = clock()
      p = longestp(s)
      endTime = clock()
      assert p == s[1:]
      t = endTime - startTime
      xs.append(log(n))
      ys.append(log(t))
      print("%d -> %f" % (n, endTime - startTime))
    
    from numpy import polyfit
    
    exponent, constant = polyfit(xs, ys, 1)
    
    print("Exponent was: %f" % (exponent))
    

    Here is the output (takes a minute or two):

    4800 -> 0.057818
    5760 -> 0.078123
    6908 -> 0.105169
    8292 -> 0.145572
    9952 -> 0.197657
    11940 -> 0.276103
    14332 -> 0.382668
    17196 -> 0.534682
    20636 -> 0.747468
    24764 -> 1.048267
    29720 -> 1.475469
    35664 -> 2.081608
    42796 -> 2.939904
    51356 -> 4.216063
    61628 -> 5.963550
    73952 -> 8.691849
    88744 -> 12.126039
    106492 -> 19.684188
    127788 -> 24.942766
    Exponent was: 1.867208    
    

    It estimates the exponent to be ~1.86, which is closer to 2 than to 3.