Search code examples
pythonperformancemethodscoding-stylebig-o

Big O Notation of a module nested in a built-in method in Python


I'm wondering about a rationale for determining the big O values behind built-in methods in Python. Given the following operation:

i_arr = ['1','2','5','3','4','5','5']

s_arr = sorted(set(i_arr))

My rationale is that this is sorted(set) = n^2 meaning two nested loops.

Based on advice I dug into the higher level modules as part of my original function in the Python source code:

 def sorted(lst):
        l = list(lst)
        l.sort()
        return l

def sort(self):
        sortable_files = sorted(map(os.path.split, self.files))

def set(self):
        if not self._value:
            self._value = True

            for fut in self._waiters:
                if not fut.done():
                    fut.set_result(True)

Now I'm really confused :)


Solution

  • Upper bound of complexity of sorting algorithm is generally O(n*log(n)), as can be seen on wikipedia. Then it depends how long it takes to cast the array to the set. From example you posted, we can see list is iterated over and in each step, value is checked whether it already is in the set or not. According to this the checking of existence of element in set has constant complexity, O(1), so whole constructing of set has complexity O(n) because of iterating over all elements. But assuming that it is implemented by adding each element one by one, we must iterate over list to transform it to the set, which is O(n), and then we sort it, which is O(n*log(n)), and that results to total complexity O(n*log(n) + n) = O(n*log(n)).

    Update:

    Mapping also has linear complexity, because it iterates over whole set and maps each element, but since linear function grows slower than n*log(n), any linear operation here does not affect the O(n*log(n)), so the asymptotic complexity remains the same even with the mapping.