Search code examples
algorithmdata-structuressortingbucket

Storing a bucket of numbers in an efficient data structure


I have a buckets of numbers e.g. - 1 to 4, 5 to 15, 16 to 21, 22 to 34,.... I have roughly 600,000 such buckets. The range of numbers that fall in each of the bucket varies. I need to store these buckets in a suitable data structure so that the lookups for a number is as fast as possible.

So my question is what is the suitable data structure and a sorting mechanism for this type of problem.

Thanks in advance


Solution

  • If the buckets are contiguous and disjoint, as in your example, you need to store in a vector just the left bound of each bucket (i.e. 1, 5, 16, 22) plus, as the last element, the first number that doesn't fall in any bucket (35). (I assume, of course, that you are talking about integer numbers.)

    Keep the vector sorted. You can search the bucket in O(log n), with kind-of-binary search. To search which bucket does a number x belong to, just go for the only index i such that vector[i] <= x < vector[i+1]. If x is strictly less than vector[0], or if it is greater than or equal to the last element of vector, then no bucket contains it.

    EDIT. Here is what I mean:

    #include <stdio.h>
    
    // ~ Binary search. Should be O(log n)
    int findBucket(int aNumber, int *leftBounds, int left, int right)
    {
        int middle;
    
        if(aNumber < leftBounds[left] || leftBounds[right] <= aNumber) // cannot find
            return -1;
        if(left + 1 == right) // found
            return left;
    
        middle = left + (right - left)/2;
    
        if( leftBounds[left] <= aNumber && aNumber < leftBounds[middle] )
            return findBucket(aNumber, leftBounds, left, middle);
        else
            return findBucket(aNumber, leftBounds, middle, right);
    }
    
    
    #define NBUCKETS 12
    int main(void)
    {
        int leftBounds[NBUCKETS+1] = {1, 4, 7, 15, 32, 36, 44, 55, 67, 68, 79, 99, 101};
        // The buckets are 1-3, 4-6, 7-14, 15-31, ...
    
        int aNumber;
        for(aNumber = -3; aNumber < 103; aNumber++)
        {
            int index = findBucket(aNumber, leftBounds, 0, NBUCKETS);
            if(index < 0)
                printf("%d: Bucket not found\n", aNumber);
            else
                printf("%d belongs to the bucket %d-%d\n", aNumber, leftBounds[index], leftBounds[index+1]-1);
        }   
        return 0;
    }