Let's say I have a large unsorted array of integers (C/C++) that mostly repeat a small range of values. For example, if I start with the following array:
{ 0, 3, 3, 3, 0, 1, 1, 1, 3, 2, 2, 3, 0, 1, 1, 1, 2, 2, 2, 2, 0, 0, 1}
I'd like to end up with this:
{ 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3}
In actuality, my array will have thousands of elements, but the range of values they can have will still be relatively small, like a dozen or so possible values.
My problem is that traditional sorting algorithms (qsort, mergesort, etc) seem a bit overkill, as they will try to ensure that every single element is in its proper position. But I'm looking for an algorithm that only cares to group elements into "buckets" and knows to terminate as soon as that has been achieved.
Well, based on this:
unsorted array of integers that mostly repeat a small range of values
Assuming that there is a maximal value in your list, you could do this:
#include <stdio.h>
#include <string.h>
int group_vals(int *arr, size_t len, int max)
{
int count[max+1];
memset(count, 0, sizeof count);
for(size_t i = 0; i < len; ++i)
count[arr[i]]++;
size_t index = 0;
for(size_t i = 0; i < max + 1; ++i)
{
for(size_t j = 0; j < count[i]; ++j)
arr[index++] = i;
}
}
int main(void)
{
int arr[] = { 0, 3, 3, 3, 0, 1, 1, 1, 3, 2, 2, 3, 0, 1, 1, 1, 2, 2, 2, 2, 0, 0, 1};
for(size_t i = 0; i < sizeof arr / sizeof *arr; ++i)
printf("%d, ", arr[i]);
puts("");
group_vals(arr, sizeof arr / sizeof *arr, 3);
for(size_t i = 0; i < sizeof arr / sizeof *arr; ++i)
printf("%d, ", arr[i]);
puts("");
return 0;
}
here I know that 3 is the maximal value of the list. This outputs
0, 3, 3, 3, 0, 1, 1, 1, 3, 2, 2, 3, 0, 1, 1, 1, 2, 2, 2, 2, 0, 0, 1,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 1,
edit
NOTE: As user coderredoc pointed out in the comments, the limitation of this approach is that it only works when the original array contains positive numbers only. Improving it to deal with negative numbers is not a big problem:
int group_vals(int *arr, size_t len, int absmax)
{
int count[2*absmax+1];
memset(count, 0, sizeof count);
for(size_t i = 0; i < len; ++i)
{
int v = arr[i];
size_t idx;
if(v == 0)
idx = absmax;
else
idx = absmax + v;
count[idx]++;
}
size_t index = 0;
for(size_t i = 0; i < 2*absmax + 1; ++i)
{
int v;
if(i == absmax)
v = 0;
v = i - absmax;
for(size_t j = 0; j < count[i]; ++j)
{
arr[index++] = v;
}
}
}
Now the function expects the maximum of the absolute values of the array.
This version prints:
-2, 0, 1, 3, 2, 3, -2, -1, -1, 3, 3,
-2, -2, -1, -1, 0, 1, 2, 3, 3, 3, 3,
PS: I didn't read John Zwinck's answer, but we both have the same idea, this is the C version of it.