I have an array of arbitrary hashes, with an element of the hash an integer (call it 'id'). I want to sort these hashes into a number of buckets (constant over the array), where each bucket is an arbitrary range of 'ids' (e.g. 1-10, 15-20, 20-30). What is the best sorting strategy to do this? Is it possible to do without a nested loop?
If the number of buckets is small, you are probably better off with the nested loops. The outer loop over the hashes, and the inner over the buckets. O(n*m)
.
If the number of hashes, and the number of buckets are large, you can:
hashes = sort(hashes)
buckets = sort(buckets) # sort by lower-bound of bucket
i = 0
foreach (hash in hashes) {
while (buckets[i].lower_bound > hash) {
i = i + 1
}
bucket[i].add(hash)
}
The basically loops through the hashes adding them to the current bucket and advancing to the next bucket when needed. O(n*log(n) + m*log(m))