What does the last `for` loop in this Radix Sort code do?

I was reading the book Learn C The Hard Way by Zed A. Shaw and I was looking over his implementation of the radix sort algorithm.

This is his code:

#define ByteOf(x, y) (((u_int8_t *)x)[y])

static inline void radix_sort(short offset, uint64_t max,
        uint64_t * source, uint64_t * dest)
{
  uint64_t count[256] = { 0 };
  uint64_t *cp = NULL;
  uint64_t *sp = NULL;
  uint64_t *end = NULL;
  uint64_t s = 0;
  uint64_t c = 0;

  // Count occurences of every byte value
  for (sp = source, end = source + max; sp < end; sp++) {
    count[ByteOf(sp, offset)]++;
  }

  // transform count into index by summing
  // elements and storing them into same array.
  for (s = 0, cp = count, end = count + 256; cp < end; cp++) {
    c = *cp;
    *cp = s;
    s += c;
  }

  // fill dest with right values in the right place
  for (sp = source, end = source + max; sp < end; sp++) {
    cp = count + ByteOf(sp, offset);
    printf("dest[%d] = %d\n", *cp, *sp);
    dest[*cp] = *sp;
    ++(*cp);
  }
}

The above is just a helper function. His actual radix sort is done here:

void RadixMap_sort(RadixMap * map)
{
  uint64_t *source = &map->contents[0].raw;
  uint64_t *temp = &map->temp[0].raw;

  radix_sort(0, map->end, source, temp);
  radix_sort(1, map->end, temp, source);
  radix_sort(2, map->end, source, temp);
  radix_sort(3, map->end, temp, source);
}

Here's the structures he's defined:

typedef union RMElement {
  uint64_t raw;
  struct {
    uint32_t key;
    uint32_t value;
  } data;
} RMElement;

typedef struct RadixMap {
  size_t max;
  size_t end;
  uint32_t counter;
  RMElement *contents;
  RMElement *temp;
} RadixMap;

I can understand the first 2 for loops in the inline function radix_sort. As far as I understand, the first one simply just counts the byte values and the second function basically makes a cumulative frequency table, where each entry is the sum of the previous entries.

I still can't wrap my head around the ByteOf(x, y) macro and the third for loop. I've tried reading the Wikipedia page for Radix-sort and I read another article that used a C++ implementation. However, the code written in each of these articles doesn't match the code that he's written.

I understand how Radix Sort works in principle. Basically, we group it according to each digit, rearranging the groupings for every new digit we encounter. For example, to sort the array [223, 912, 275, 100, 633, 120, 380], you first group them by the ones digit so you get [380, 100, 120], [912], [633, 223], [275]. Then you do the same thing with the tens and hundreds place until you've run out of digits.

Any help explaining his code would be appreciated. Thanks.

Solution

ByteOf(x, y) is the same as:

#define ByteOf(x, y)  ((*(x) >> (offset*8)) & 0xff)

That is, it isolates the value of byte #{offset} within a value.

The second loop is a sort of allocator. If the first six counts[] were 1,2,4,0,16,25 after the first loop they would be 0,1,3,7,7,23 after the second. This directs the third loop (over source[]) to layout the destination as:

ByteOf       index       number of values
0            0           1
1            1           2
2            3           4
3            7           0 -- there are none.
4            7           16
5            23          25

I find it a bit clearer to rewrite the third loop as:

  for (i = 0; i < max; i++) {
    dest[count[ByteOf((source+i), offset)]++] = source[i];
  }

I think it shows the relationship more clearly, that is that the ith’ source element is being copied to an index in dest. The index in dest is at the start of the partition (count[]) previously computed for this digit. Since there is now a number at this location, we increment the start of this partition to prevent over-writing it.

Note that the brackets around (source+i) are necessary to get the right address for the cast in ByteOf.