Search code examples
algorithmlarge-data

Divide Huge Array of Numbers in Buckets


Given a Huge Array of Numbers( Say 1 trillion ). How can you divide them in say N Buckets each having some range. Cases : 1. Assuming Distribution is Non-Uniform 2. Assuming Distribution is Uniform.


Solution

  • If non uniform, and you want equal # of buckets, simply make your own hashtable w/ your own lambda value.

    If you have 1000 numbers from 1-1000, and you want 10 buckets, simply give a hashcode of 1-100 to be 0, and 101-200 to be 1. This is really easy to do - you can just do (maxNum (in first instance it's 100) -1)/100 (which is 1000/numOfBuckets) to find the index of the array inside of your hash table.

    If you want to have even distribution - that's a little harder. You're going to have to first take in with the previous uneven distribution, and then rehash so each bucket has the same # of numbers.

    To rehash, simply take the # of numbers (iterate through each bucket, finding size, and add) and then divide by # of buckets. Now you have the new lambda value. If you don't care about the range being not uniform(as in 1-10, 11-20, instead of 1-15, 15-20, etc), then iterate through your old hashtable and add into a new hashtable w/ the new lambda value, filling sequentially - that's the closest you'll get(sometimes you'll get -1 from lambda value).

    If you don't care much about the original uneven distribution but equal range, simply just take the number of numbers you have, sort using some easy sort like quicksort, and then put (lambda value)# of numbers in each bucket.

    Not sure if that's what you mean, but hope this helps.