Search code examples
pythontime-complexitybig-omergesort

Analysing merge sort complexity


The code I wrote is as follows:

def merge(A, B): #merging two sorted lists
   index_a = index_b = 0
   result = []
   while(index_a < len(A) and index_b < len(B)):
       if(A[index_a] <= B[index_b]):
           result += [A[index_a]]
           index_a += 1
       else:
           result += [B[index_b]]
           index_b += 1
   if(index_a != len(A)):#adds all remaining elements
       result += A[index_a:]
   elif(index_b != len(B)):
       result += B[index_b:]
   return result

def merge_sort(L):
   if(len(L) == 0 or len(L) == 1):
       return L
   else:
       a = merge_sort( L[:len(L)/2] )
       b = merge_sort( L[len(L)/2:] )
       return merge(a , b)

So I can understand the complexity of merge_sort function as shown above is log(n) for each recursive call. What I was slightly confused is in the merge function the number of steps taken, first the two assignments but after that I dont know how to get the number of steps taken in the worst case because I cant think of where to stop in the length of A or B and the later addition.

Any help would be really appreciated. Thank You.


Solution

  • The way you've structured your merge, while slightly more performant, makes it trickier to see the complexity. Try this:

    def merge(A, B): #merging two sorted lists
       index_a = index_b = 0
       result = []
       while(index_a < len(A) or index_b < len(B)): # NOTE OR
           if index_a < len(A) and (index_b == len(B) or A[index_a] <= B[index_b]): # NOTE EXTRA CONDITIONS
               result += [A[index_a]]
               index_a += 1
           else:
               result += [B[index_b]]
               index_b += 1
       return result
    
    def merge_sort(L):
       ...
    

    Here I've replaced the "then dump the rest" code with a change to the while-loop condition (allowing it to run until both input lists are sorted), and an extra condition to the if-statement to make that work properly. If you assume (as is normal) that concatenating two arrays is O(n) in the size of the second array, the algorithmic complexity is unchanged by this modification. (It's unchanged even if you assume that that's constant time, but it's slightly more complicated to explain.)

    Now you can see that the number of times the while-loop is executed is exactly equal to the sum of the length of the lists. This way there's no "worst case".