I'm reading this paper Product quantization for nearest neighbor search.
On the last row of table II page 5 it gives
the complexity given in this table for searching the k smallest elements is the average complexity for n >> k and when the elements are arbitrarily ordered
which is n+klogkloglogn
.
I guess we can use linear selection algorithm to get unsorted k nearest neighbors with O(n)
, and sort the k nearest neighbors with O(klogk)
, so can we have O(n+klogk)
in total. But where does the loglogn
term come from?
The paper gives a reference to the TAOCP book for this, but I don't have the book at hand, could anyone explain it for me?
First, Table II reports the complexity of each of the step, therefore you have to add all the terms to measure the complexity of ADC.
In the last line of the table, it is a single complexity both for SDC and ADC, which is:
n + k log k log log n
The term corresponds to the average algorithmic cost of the selection algorithm that we employ to find the k smallest values in a set of n variables, that we have copy/pasted from Donald Knuth book [25].
I don't have the book in hands to I can not check, but it sounds right.
From the authors