Search code examples
algorithmsortingbucket

What is the most efficient way to sort a collection of items into buckets?


I have an array of arbitrary hashes, with an element of the hash an integer (call it 'id'). I want to sort these hashes into a number of buckets (constant over the array), where each bucket is an arbitrary range of 'ids' (e.g. 1-10, 15-20, 20-30). What is the best sorting strategy to do this? Is it possible to do without a nested loop?


Solution

  • If the number of buckets is small, you are probably better off with the nested loops. The outer loop over the hashes, and the inner over the buckets. O(n*m).

    If the number of hashes, and the number of buckets are large, you can:

    hashes = sort(hashes)
    buckets = sort(buckets) # sort by lower-bound of bucket
    i = 0
    
    foreach (hash in hashes) {
      while (buckets[i].lower_bound > hash) {
        i = i + 1
      }
      bucket[i].add(hash)
    }
    

    The basically loops through the hashes adding them to the current bucket and advancing to the next bucket when needed. O(n*log(n) + m*log(m))