When searching an element or an insertion point in a sorted array, there are basically two approaches: straight search (element by element) or binary search. From the time complexities O(n)
vs O(log(n))
we know that binary search is ultimately more efficient, however this does not automatically imply that binary search will always be faster than "normal search".
My question therefore is: Can binary search be practically less efficent than "normal" search for low n
? If yes, can we estimate the point at which binary search will be more efficient?
Thanks!
Yes, a binary search can be practically less efficient than "normal" search for a small n
. However, this is very hard to estimate the point at which a binary search will be more efficient (if even possible) because this is very dependent of the problem (eg. data type, search predicate), the hardware (eg. processor, RAM) and even the dynamic state of the hardware used when the search is performed as well as the actual data in the sorted array on modern systems.
The first reason a binary search can be less efficient is vectorization. Indeed, modern processors can support SIMD instructions working on pretty big vectors. Thus, a linear search can work simultaneously on many item per processing cycle. Modern processors can even often execute few SIMD instructions in parallel per cycle. While linear searches can be often trivially vectorized, it is not the case of binary searches which are almost inherently sequential. One should keep in mind that vectorization is not always possible nor always automatically done by compilers, especially on non-trivial data types (eg. composite data structures, pointer-based types) or non-trivial search predicates (eg. the ones with conditionals or memory indirections).
The second reason a binary search can be less efficient is branch predictability. Indeed, modern processors try to predict branches ahead of time to avoid pipeline stall. If this prediction works, then branches can be taken very quickly, otherwise the processor can stall for several cycles (up to dozens). A branch can be easily predicted if it is always true or always false. A randomly taken branch cannot be predicted causing stalls. Because the array is sorted, branches in linear searches are easy to predict (branches are either always taken or never taken until the element is found), while this is clearly not the case for binary searches. As a result, the speed of a search is dependent of the searched item, and data inside the sorted array.
The same thing apply for cache misses and memory fetches: because the latency of the RAM is very big compared to executing arithmetic instructions, modern processors contains dedicated hardware prefetching units trying to predict the next memory fetches and prefetch data ahead of time in order to avoid cache misses. Prefetchers are good to predict linear/contiguous memory accesses but very bad for random memory accesses. Memory accesses of linear searches are trivial while the one of binary searches appear to be mostly random for many processors. A cache miss happening during a binary search will certainly cause the processor to stall for a lot of cycles. If the sorted array is already loaded in cache, a binary search on it can be much faster.
But this is not enough: using wide SIMD instructions or doing cache-misses can impact the frequency of the computing core and so the speed of the algorithm. Not to mention that the size of the data type also matters a lot as the memory throughput is limited and strided memory accesses are slower than contiguous one. One should also take into account the additional complexity of binary searches compared to linear ones (ie. often more instructions to execute). I guess I missed some important points in the above list.
As a programmer, you may need to define a threshold to choose which algorithm to use. If you really need that, the best solution is to find is automatically using a benchmark or autotuning methods. Practical experimentations shows that the threshold changed over the last decades for a given fixed context (data type, cache state, etc.), in favour to linear searches (so the thresholds are generally increasing over time).
My personal advice is not to use a binary search for value of n
smaller than 256 / data_type_size_in_bytes
with trivial/native data types on mainstream processors. I think it is a good idea to use a binary search when n
is bigger than 1000, or also when the data-type is non-trivial as well as when the predicate is expensive.