Search code examples
sortingtime-complexitybig-o

Deriving O(N*log(N)) for Comparison Sort, question on one particular step in wikipedia's derivation


I'm brushing up on my Big O notation via 'Cracking the Code Interview', 6th Ed., and in the chapter on Big O notation, he poses in example 8 that comparison sorts are generally O(n*log(n)).
I wished to understand this derivation, and looked to the wikipedia page on comparison sort here, under the section "Number of comparisons required to sort a list", the article does the following:

By looking at the first n/2 factors of n!, we get: log2(n!) >= log2((n/2)^(n/2))

I do not understand how this exponent was derived. The base makes sense to me since we're explicitly looking at the first n/2 factors, but I am failing to grasp how one might arrive at an exponent of n/2.

Thank you for any and all explanations in advance!

To me, it makes sense after the fact, like if you already knew you were dealing with logarithms as a potentiality. I guess I should be considering that always as a possibility when attempting to derive the time complexity of an algorithm. I wrote out the first n/2 factors of 6! as an example to myself.

6*5*4 = 120
(6/2)^(6/2) -> 3^3 = 27

And yeah, 120 >= 27, but still the reasoning behind the choice to use log2((n/2)^(n/2)) instead of another variation on a log escapes me still.


Solution

  • The goal of the derivation is to get a lower bound on the value of log(𝑛!). There are obvious lower bounds, like log(1), log(𝑛),...etc, just by omitting almost all factors in 𝑛! But we want to find a more tight lower bound. The approach that is taken here, is to eliminate half of the factors in 𝑛!, namely the smaller ones. So first write out 𝑛! with all its 𝑛 factors:

          𝑛! = 𝑛(𝑛−1)(𝑛−2) ... (⌈𝑛/2⌉+1)(⌈𝑛/2⌉)(⌈𝑛/2⌉−1) ... (3)(2)(1)

    Then omit the second half of these factors to keep only the first ⌊𝑛/2⌋ factors, so that surely we arrive at a value that is not greater than the original:

          𝑛! ≥ 𝑛(𝑛−1)(𝑛−2) ... (⌈𝑛/2⌉+1)

    We continue to simplify it more by replacing these factors all by ⌊𝑛/2⌋, so that we get ⌊𝑛/2⌋ factors that are all the same. Again this expression can not be greater than what we already had, as we replaced each factor (if any) with a smaller one (From now on 𝑛/2 is short for ⌊𝑛/2⌋):

          𝑛! ≥ (𝑛/2)(𝑛/2)(𝑛/2) ... (𝑛/2)

    which is the same as saying:

          𝑛! ≥ (𝑛/2)𝑛/2

    Back to the log2(𝑛!) we started with. As the logarithm is an always increasing function, we can apply the logarithm to both sides of the comparison:

          log2𝑛! ≥ log2[(𝑛/2)𝑛/2]

    Not your question, but the reasoning continues by applying some of the logarithmic properties:

          log2𝑛! ≥ (𝑛/2)log2(𝑛/2)

          log2𝑛! ≥ (𝑛/2)(log2𝑛 − log22)

          log2𝑛! ≥ (𝑛/2)(log2𝑛 − 1)

          log2𝑛! ≥ (𝑛/2)log2𝑛 − 𝑛/2

    Practical example

    Let's take 𝑛=12, then 𝑛! is 479001600, which is 12⋅11⋅10⋅9⋅8⋅7⋅6⋅5⋅4⋅3⋅2⋅1

    If we only keep the first half of these factors in the product, we surely get a smaller result:

          12⋅11⋅10⋅9⋅8⋅7

    The second step was to replace the remaining factors with 𝑛/2, so we reduce even more to get this:

          6⋅6⋅6⋅6⋅6⋅6

    As we made all factors smaller than they were, we surely get a smaller product, i.e. 46656

    This remains true if we apply the logarithm to both the original value (𝑛!) and this value ((𝑛/2)𝑛/2):

          log2(12!) = 28.8354552341...

    and

          log2(66) = 15.5097750043...