So I stumbled about non-comparison sorting based algorithms bucket sort to be exact and I couldn't exactly get why it is good.
I've a thought but I need somebody to confirm it.
Let's assume I want to sort a 1000 element array.If it were uniformly distributed and bucketed into 10 buckets where each bucket had 100 elements.
sorting 100 element 10 times using n log(n) algorithm = 10 * 100 log(100) = 1000 log(100) = 2000
while sorting 1000 elements using n log(n) algorithm = 1000 log(1000) = 3000
So the algorithm makes use that if n = m + l then (m+l)^2 > m^2 + l^2 and same applies to n log(n) algorithms
so the more uniformly bucketed the data is the better the performance of the bucket sort
Is this right ?
and what would the optimum number of buckets be ? ( I feel it's a space-time trade off thing but also depending on uniformity of the data being sorted)
But you have to take into account that the bucketing step has a complexity of 1000. This gives you:
1000 + 10 * 100 log(100) = 3000
1000 * log(1000) = 3000
But you can reapply again the bucketing strategy to sort the smaller arrays. This is https://en.wikipedia.org/wiki/Radix_sort .
The complexity advertised is O(n.w)
where w
is the number of bits to represent an element. Linear? Better than merge sort? Wait a minute, how big is w
usually? Yeah right, for usual sets of stuff, you have to use log(n)
bits to represent elements, so back to n log(n)
.
As you said this is a time/memory trade of though, and Radix sort is when you have a fixed memory budget (but who doesn't?). If you can grow your memory linearly with the input size, take n
buckets and you have a O(n)
sort.
An example reference (there are many!): https://www.radford.edu/nokie/classes/360/Linear.Sorts.html .