The following Radix sort does four passes of counting sort (256 buckets, 32-bit integers, starting from lowest significant digits), taken from Sedgewick's Algorithms textbook.
public class LSD {
private final static int BITS_PER_BYTE = 8;
// LSD sort an array of integers, treating each int as 4 bytes
// assumes integers are nonnegative
// [ 2-3x faster than Arrays.sort() ]
public static void sort(int[] a) {
int BITS = 32; // each int is 32 bits
int W = BITS / BITS_PER_BYTE; // each int is 4 bytes
int R = 1 << BITS_PER_BYTE; // each bytes is between 0 and 255
int MASK = R - 1; // 0xFF
int N = a.length;
int[] aux = new int[N];
for (int d = 0; d < W; d++) {
// compute frequency counts
int[] count = new int[R+1];
for (int i = 0; i < N; i++) {
int c = (a[i] >> BITS_PER_BYTE*d) & MASK;
count[c + 1]++;
}
// compute cumulates
for (int r = 0; r < R; r++)
count[r+1] += count[r];
// for most significant byte, 0x80-0xFF comes before 0x00-0x7F
if (d == W-1) {
int shift1 = count[R] - count[R/2];
int shift2 = count[R/2];
for (int r = 0; r < R/2; r++)
count[r] += shift1;
for (int r = R/2; r < R; r++)
count[r] -= shift2;
}
// move data
for (int i = 0; i < N; i++) {
int c = (a[i] >> BITS_PER_BYTE*d) & MASK;
aux[count[c]++] = a[i];
}
// copy back
for (int i = 0; i < N; i++)
a[i] = aux[i];
}
}
I understand most of the code except for this part:
if (d == W-1) {
int shift1 = count[R] - count[R/2];
int shift2 = count[R/2];
for (int r = 0; r < R/2; r++)
count[r] += shift1;
for (int r = R/2; r < R; r++)
count[r] -= shift2;
}
What's the purpose of this segment of code? Thanks!
The code block does exactly what the comment says:
for most significant byte, 0x80-0xFF comes before 0x00-0x7F
The reason for this is: since you are using int
, so the most significant bit is the sign bit. Therefore numbers with the most significant byte in the range 0x80-0xFF
are negative numbers, so should be placed before positive numbers, which has the most significant byte in the range of 0x00-0x7F
.
If you are asking how the code block achieves it, here is a brief idea:
Since you understood how data is moved, so I assume you understood what does count[]
do in the entire code. In the code block, R
is the upper bound, which is 0xFF + 1
, and R / 2
is 0x7F + 1
. Therefore count[R] - count[R / 2]
is the total number in the range of 0x80
to 0xFF
. Therefore by adding a shift of count[R] - count[R / 2]
to count[0 .. R / 2]
and subtracting it from count[R / 2 .. R]
will help numbers ranged in 0x00
to 0x7F
have higher count
value than numbers ranged in 0x80
to 0xFF
, which results in 0x80-0xFF comes before 0x00-0x7F ultimately.
Lastly, you may be curious: if the first bit is sign bit, why 11111111
is bigger than 10000001
? Isn't that -(127) < -(1)
? This is because in the computer system, we are using 2's compliment instead of signed integers, therefore 11111111
actually means -1
, and 10000001
actually means -127
.