Search code examples
c#algorithmdesign-patternsdata-structuresfrequency-distribution

What is the fastest way to calculate frequency distribution for array in C#?


I am just wondering what is the best approach for that calculation. Let's assume I have an input array of values and array of boundaries - I wanted to calculate/bucketize frequency distribution for each segment in boundaries array.

Is it good idea to use bucket search for that?

Actually I found that question Calculating frequency distribution of a collection with .Net/C#

But I do not understand how to use buckets for that purpose cause the size of each bucket can be different in my situation.

EDIT: After all discussions I have inner/outer loop solution, but still I want to eliminate the inner loop with a Dictionary to get O(n) performance in that case, if I understood correctly I need to hash input values into a bucket index. So we need some sort of hash function with O(1) complexity? Any ideas how to do it?


Solution

  • Bucket Sort is already O(n^2) worst case, so I would just do a simple inner/outer loop here. Since your bucket array is necessarily shorter than your input array, keep it on the inner loop. Since you're using custom bucket sizes, there are really no mathematical tricks that can eliminate that inner loop.

    int[] freq = new int[buckets.length - 1];
    foreach(int d in input)
    {
        for(int i = 0; i < buckets.length - 1; i++)
        {
             if(d >= buckets[i] && d < buckets[i+1])
             {
                 freq[i]++;
                 break;
             }
        }
    }
    

    It's also O(n^2) worst case but you can't beat the code simplicity. I wouldn't worry about optimization until it becomes a real issue. If you have a larger bucket array, you could use a binary search of some sort. But, since frequency distributions are typically < 100 elements, I doubt you'd see a lot of real-world performance benefit.