For finding top K elements using heap, which approach is better?
I did some calculations and at no point, I see that NLogK better than KlogN.
N= 16 (2^4), k = 8 (2^3)
O(Nlog(K)) = 16* 3 = 48
O(Klog(N)) = 8 * 4 = 32
N= 16 (2^4), k = 12 (log to base 2 = 3.5849)
O(Nlog(K)) = 16* 3.5849 = 57.3584
O(Klog(N)) = 12 * 4 = 48
N= 256 (2^8), k = 4 (2^2)
O(Nlog(K)) = 256* 2 = 512
O(Klog(N)) = 4 * 8 = 32
N= 1048576 (2^20), k = 16 (2^4)
O(Nlog(K)) = 1048576* 4 = 4194304
O(Klog(N)) = 16 * 20 = 320
N= 1048576 (2^20), k = 1024 (2^10)
O(Nlog(K)) = 1048576* 10 = 10485760
O(Klog(N)) = 1024 * 20 = 20480
N= 1048576 (2^20), k = 524288 (2^19)
O(Nlog(K)) = 1048576* 19 = 19922944
O(Klog(N)) = 524288 * 20 = 10485760
I just wanted to confirm that my approach is correct and adding all elements to heap and extract top k elements is always the best approach. (and also simpler one)
I did some calculations and at no point, I see that NLogK better than KlogN.
Since, K <= N, NlogK will always be more than or equal to KlogN.
That does not mean, min heap approach is going to take more time than max heap approch.
Need to consider the below,
In Min heap approach, we will update the heap only if the next value is larger than the head. If the array is in ascending order, we will do it (N-K) times, if it is descending order we will not update it at all. On average, the number of times the tree gets updated is considerable less than N.
In Max heap, you need to heapify the tree of size N. If K is negligibly small when compared to N, then this time can become a dominant factor. While in the case of min heap, heapify works on the smaller set of K. Also as mentioned in point 1, most of the value of N will not trigger an update of the tree.
I wrote a small program to compare both the approach. The source can be accessed here.
Results for an array ranging from 0 to 1M in random order is below:
Test for Array size: 1000000
k MinHeapIterations MinHeapTime(ms) MaxHeapTime(ms) MaxTime/MinTime
1 15 6.07 72.03 11.88
10 114 3.85 70.09 18.19
100 913 4.11 69.60 16.93
1000 6874 5.32 72.94 13.71
10000 46123 16.52 79.89 4.83
100000 230385 78.19 132.27 1.69
1000000 0 35.86 453.57 12.65
As you can see,