Search code examples
algorithmsortingtime-complexitybig-ocomplexity-theory

Understanding Big O complexity


I'm having tough time in understanding Big O time complexity.

Formal definition of Big O :

f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.

And worst time complexity of insertion sort is O(n^2).

I want to understand what is f(n), g(n), c and k here in case of insertion sort.


Solution

  • Explanation

    It is not that easy to formalize an algorithm such that you can apply Big-O formally, it is a mathematical concept and does not easily translate to algorithms. Typically you would measure the amount of "computation steps" needed to perform the operation based on the size of the input.

    • So f is the function that measures how many computation steps the algorithm performs.
    • n is the size of the input, for example 5 for a list like [4, 2, 9, 8, 2].
    • g is the function you measure against, so g = n^2 if you check for O(n^2).
    • c and k heavily depend on the specific algorithm and how exactly f looks like.

    Example

    The biggest issue with formalizing an algorithm is that you can not really tell exactly how many computation steps are performed. Let's say we have the following Java code:

    public static void printAllEven(int n) {
        for (int i = 0; i < n; i++) {
            if (i % 2 == 0) {
                System.out.println(i);
            }
        }
    }
    

    How many steps does it perform? How deep should we go? What about for (int i = 0; i < count; i++)? Those are multiple statements which are executed during the loop. What about i % 2? Can we assume this is a "single operation"? On which level, one CPU cycle? One assembly line? What about the println(i), how many computation steps does it need, 1 or 5 or maybe 200?

    This is not practical. We do not know the exact amount, we have to abstract and say it is a constant A, B and C amount of steps, which is okay since it runs in constant time.


    After simplifying the analysis, we can say that we are effectively only interested in how often println(i) is called.

    This leads to the observation that we call it precisely n / 2 times (since we have so many even numbers between 0 and n.

    The exact formula for f using aboves constants would yield something like

    n * A + n * B + n/2 * C
    

    But since constants do not really play any role (they vanish in c), we could also just ignore this and simplify.


    Now you are left with proving that n / 2 is in O(n^2), for example. By doing that, you will also get concrete numbers for c and k. Example:

    n / 2 <= n <= 1 * n^2 // for all n >= 0
    

    So by choosing c = 1 and k = 0 you have proven the claim. Other values for c and k work as well, example:

    n / 2 <= 100 * n <= 5 * n^2 // for all n >= 20
    

    Here we have choosen c = 5 and k = 20.


    You could play the same game with the full formula as well and get something like

    n * A + n * B + n/2 * C
        <= n * (A + B + C)
        = D * n
        <= D * n^2 // for all n > 0
    

    with c = D and k = 0.

    As you see, it does not really play any role, the constants just vanish in c.