Search code examples
pythonalgorithmbig-otime-complexityexponential

Running Time Complexity of my Algorithm - how do i compute this and further optimize the algorithm?


I designed a recursive algorithm and wrote it down in Python. When I measure the running time with different parameters, it seems to take exponential time. Furthermore; it takes more than half an hour to end with small numbers such as 50. (I didn't wait until it finishes, but it doesn't seem to finish in a reasonable amount of time, guess it's exponential).

So, I'm curious about the running time complexity of this algorithm. Can someone please help me derive the equation T(n,m)? Or to compute the big-oh?

The algorithm is below:

# parameters:
# search string, the index where we left on the search string, source string, index where we left on the source string,
# and the indexes array, which keeps track of the indexes found for the characters
def find(search, searchIndex, source, sourceIndex, indexes):
    found = None
    if searchIndex < len(search): # if we haven't reached the end of the search string yet
        found = False
        while sourceIndex < len(source): # loop thru the source, from where we left off
            if search[searchIndex] == source[sourceIndex]: # if there is a character match
                # recursively look for the next character of search string 
                # to see if it can be found in the remaining part of the source string
                if find(search, searchIndex + 1, source, sourceIndex + 1, indexes):
                    # we have found it
                    found = True # set found = true
                    # if an index for the character in search string has never been found before.
                    # i.e if this is the first time we are finding a place for that current character
                    if indexes[searchIndex] is None:
                        indexes[searchIndex] = sourceIndex # set the index where a match is found
                    # otherwise, if an index has been set before but it's different from what
                    # we are trying to set right now. so that character can be at multiple places.
                    elif indexes[searchIndex] != sourceIndex: 
                        indexes[searchIndex] = -1 # then set it to -1.
            # increment sourceIndex at each iteration so as to look for the remaining part of the source string. 
            sourceIndex = sourceIndex + 1
    return found if found is not None else True

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    # so in this example where N=7, allcards=['B','R','R','B','R','B','R']
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]
    # indexes is initially None.
    indexes = [None] * len(colors)

    find(colors, 0, allcards, 0, indexes)
    return indexes    

if __name__ == "__main__":
    print theCards(7, list("BBB"))

I don't know if one has to understand the problem and the algorithm in order to derive the worst-case running time, but here is the problem I attempted to solve:

The Problem:

Given a source string SRC and a search string SEA, find the subsequence SEA in SRC and return the indexes of where each character of SEA was found in SRC. If a character in SEA can be at multiple places in SRC, return -1 for that characters position.

For instance; if the source string is BRRBRBR (N=7) and the search string is BBB: then the first 'B' in 'BBB' can appear at index 0 in the search string. The second 'B' can be at index 3 of the search string and the last 'B' can be at the 5th position. Furthermore; there exists no other alternatives for the positions of the characters 'BBB', and thus the algorithm returns [0,3,5].

In another case, where the source string is BRRBRB (N=6) and the search string is RBR: the first 'R' of 'RBR' can be at position 1 or 2. This leaves only position 3 for 'B' and position 4 for the last 'R'. Then, the first 'R' can be at multiple places, it's place is ambigious. The other two characters, B and R, have only one place. So the algorithm returns [-1,4,5].

The case where the algorithm doesn't finish and take forever is when the source string is ['B', 'R', 'R', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'B', 'B'] (N=58) and the search string is RBRRBRBBRBRRBBRRBBBRRBBBRR. It should return [-1, -1, -1, -1, -1, -1, -1, -1, 17, 18, 19, 23, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 47, 53 ], but unfortunately it doesn't =(

Optimizations:

I thought of halting the search when the 'indexes' list was completely full of -1s. But that only affects the best-case (or maybe the average-case) but not the worst-case. How can one further optimize this algorithm. I know that there exists a polynomial solution to this problem.

More important than the optimizations, I'm really curious about the T(n,m) equation of the running time, where n and m are the lengths of the source and search strings.

If you were able to read until here, thank you very much! =)

EDIT - IVIad's solution implemented:

def find2(search, source):
    indexes = list()
    last = 0
    for ch in search:
        if last >= len(source):
            break
        while last < len(source) and source[last] != ch:
            last = last + 1
        indexes.append(last)
        last = last + 1
    return indexes

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]

    indexes = find2(colors, allcards) # find the indexes of the first occurrences of the characters
    colors.reverse() # now reverse both strings
    allcards.reverse()
    # and find the indexes of the first occurrences of the characters, again, but in reversed order
    indexesreversed = find2(colors, allcards)
    indexesreversed.reverse() # reverse back the resulting list of indexes 
    indexesreversed = [len(allcards) - i - 1 for i in indexesreversed] # fix the indices

    # return -1 if the indices are different when strings are reversed
    return [indexes[i] + 1 if indexes[i] == indexesreversed[i] else - 1 for i in range(0, len(indexes))]

if __name__ == "__main__":
    print theCards(495, list("RBRRBRBBRBRRBBRRBBBRRBBBRR"))

Solution

  • You can do it in O(n + m), where m is the number of characters in SEA, and n the number of characters in SRC:

    last = 1
    for i = 1 to m do
        while SRC[last] != SEA[i]
            ++last
    
        print last
        ++last (skip this match)
    

    Basically, for each character in SEA, find its position in SRC, but only scan after the position where you found the previous character.

    For instance; if the source string is BRRBRBR (N=7) and the search string is BBB

    Then: find B in SRC: found at last = 1 print 1, set last = 2.

    Find B in SRC: found at last = 4, print 4, set last = 5.

    Find B in SRC: found at last = 6, print 6, set last = 7. Done.


    As for the complexity of your algorithm, I'm not able to provide a very formal analysis, but I'll try to explain how I'd go about it.

    Assume that all characters are equal in both SRC and SEA and between them. Therefore we can eliminate the condition in your while loop. Also note that your while loop executes n times.

    Note that for the first character you will call find(1, 1), ... find(m, n). But these will also start their while loops and make their own recursive calls. Each find(i, j) will make O(m) recursive calls that in its while loop, for i = 1 to n. But these in turn will make more recursive calls themselves, resulting in a sort of "avalanche effect" that causes exponential complexity.

    So you have:

    i = 1: calls find(2, 2), find(3, 3), ..., find(m, n)
           find(2, 2) calls find(3, 3), ..., find(m, n)
           find(3, 3) calls find(4, 4), ..., find(m, n)
           find(4, 4) calls find(5, 5), ..., find(m, n)
           ...
           total calls: O(m^m)
    i = 2: same, but start from find(2, 3).
    ...
    i = n: same
    

    Total complexity thus looks like O(n*m^m). I hope this makes sense and I haven't made any mistakes.