algorithm data-structures time-complexity complexity-theory

Why big-Oh is not always a worst case analysis of an algorithm?

I am trying to learn analysis of algorithms and I am stuck with relation between asymptotic notation(big O...) and cases(best, worst and average).

I learn that the Big O notation defines an upper bound of an algorithm, i.e. it defines function can not grow more than its upper bound.

At first it sound to me as it calculates the worst case. I google about(why worst case is not big O?) and got ample of answers which were not so simple to understand for beginner.

I concluded it as follows: Big O is not always used to represent worst case analysis of algorithm because, suppose a algorithm which takes O(n) execution steps for best, average and worst input then it's best, average and worst case can be expressed as O(n).

Please tell me if I am correct or I am missing something as I don't have anyone to validate my understanding. Please suggest a better example to understand why Big O is not always worst case.

Solution

Big-O?

First let us see what Big O formally means:

In computer science, big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows.

This means that, Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same O notation. Here, O means order of the function, and it only provides an upper bound on the growth rate of the function.

Now let us look at the rules of Big O:

If f(x) is a sum of several terms, if there is one with largest growth rate, it can be kept, and all others omitted
If f(x) is a product of several factors, any constants (terms in the product that do not depend on x) can be omitted.

Example:

f(x) = 6x^4 − 2x^3 + 5

Using the 1st rule we can write it as, f(x) = 6x^4

Using the 2nd rule it will give us, O(x^4)

What is Worst Case?

Worst case analysis gives the maximum number of basic operations that have to be performed during execution of the algorithm. It assumes that the input is in the worst possible state and maximum work has to be done to put things right.

For example, for a sorting algorithm which aims to sort an array in ascending order, the worst case occurs when the input array is in descending order. In this case maximum number of basic operations (comparisons and assignments) have to be done to set the array in ascending order.

It depends on a lot of things like:

CPU (time) usage
memory usage
disk usage
network usage

What's the difference?

Big-O is often used to make statements about functions that measure the worst case behavior of an algorithm, but big-O notation doesn’t imply anything of the sort.

The important point here is we're talking in terms of growth, not number of operations. However, with algorithms we do talk about the number of operations relative to the input size.

Big-O is used for making statements about functions. The functions can measure time or space or cache misses or rabbits on an island or anything or nothing. Big-O notation doesn’t care.

In fact, when used for algorithms, big-O is almost never about time. It is about primitive operations.

When someone says that the time complexity of MergeSort is O(nlogn), they usually mean that the number of comparisons that MergeSort makes is O(nlogn). That in itself doesn’t tell us what the time complexity of any particular MergeSort might be because that would depend how much time it takes to make a comparison. In other words, the O(nlogn) refers to comparisons as the primitive operation.

The important point here is that when big-O is applied to algorithms, there is always an underlying model of computation. The claim that the time complexity of MergeSort is O(nlogn), is implicitly referencing an model of computation where a comparison takes constant time and everything else is free.

Example -

If we are sorting strings that are kk bytes long, we might take “read a byte” as a primitive operation that takes constant time with everything else being free.

In this model, MergeSort makes O(nlogn) string comparisons each of which makes O(k) byte comparisons, so the time complexity is O(k⋅nlogn). One common implementation of RadixSort will make k passes over the n strings with each pass reading one byte, and so has time complexity O(nk).