Search code examples
algorithmsortingbucket-sort

What make Bucket Sort good?


So I stumbled about non-comparison sorting based algorithms bucket sort to be exact and I couldn't exactly get why it is good.

I've a thought but I need somebody to confirm it.

Let's assume I want to sort a 1000 element array.If it were uniformly distributed and bucketed into 10 buckets where each bucket had 100 elements.

sorting 100 element 10 times using n log(n) algorithm = 10 * 100 log(100) = 1000 log(100) = 2000

while sorting 1000 elements using n log(n) algorithm = 1000 log(1000) = 3000

So the algorithm makes use that if n = m + l then (m+l)^2 > m^2 + l^2 and same applies to n log(n) algorithms

so the more uniformly bucketed the data is the better the performance of the bucket sort

Is this right ?

and what would the optimum number of buckets be ? ( I feel it's a space-time trade off thing but also depending on uniformity of the data being sorted)


Solution

  • But you have to take into account that the bucketing step has a complexity of 1000. This gives you:

    • bucket sort: 1000 + 10 * 100 log(100) = 3000
    • comparison sort: 1000 * log(1000) = 3000

    But you can reapply again the bucketing strategy to sort the smaller arrays. This is https://en.wikipedia.org/wiki/Radix_sort .

    The complexity advertised is O(n.w) where w is the number of bits to represent an element. Linear? Better than merge sort? Wait a minute, how big is w usually? Yeah right, for usual sets of stuff, you have to use log(n) bits to represent elements, so back to n log(n).

    As you said this is a time/memory trade of though, and Radix sort is when you have a fixed memory budget (but who doesn't?). If you can grow your memory linearly with the input size, take n buckets and you have a O(n) sort.

    An example reference (there are many!): https://www.radford.edu/nokie/classes/360/Linear.Sorts.html .