I want to sort Integers. Therefore I interpret them as binary strings of length 32. Now I am able to apply bucketsort for each component. Here is my implementation in java:
public static void sort(List<Integer> numbers)
{
Queue<Integer> tmp =new LinkedList<Integer>();
for (int i = 0; i < numbers.size(); i++) {
tmp.offer(numbers.get(i));
}
List<Integer>[] bucket = new ArrayList[2];
for (int i = 0; i < bucket.length; i++) {
bucket[i] = new ArrayList<Integer>();
}
for (int k = 32; k > 0; k--) {
// Clear
for (int i = 0; i < bucket.length; i++) {
bucket[i].clear();
}
while (!tmp.isEmpty()) {
Integer firstel =tmp.element();
String el =String.valueOf(firstel);
if (el.charAt(k - 1) == 0) {
bucket[0].add(firstel);
} else {
bucket[1].add(firstel);
}
tmp.remove();
}
for (int i = 0; i < bucket.length; i++) {
for (Integer j : bucket[i]) {
tmp.add(j);
}
}
}
}
I am not sure, whether my code deals right with these binary strings. Do I have to convert the integers from the list numbers to binary strings? Note: It is only for practice. There is no deeper sense concerning time complexity.
Let's first rewrite this a bit, because there is no need for Queues, or a tmp arraylist, etc. Step one, a stub class that lets us write less code:
private class Numlist extends ArrayList<Integer> {
Numlist() { super(); }
}
Done, we no longer need to write ArrayList<Integer>
everywhere.
Now, java does "autoboxing" so anything that you store Integers in can do int
, and vice versa. That's handy, because we can throw away all that string stuff. We don't need it if we just care about bit comparison:
public void sort(Numlist numbers) {
// No copying `numbers` to a `tmp` queue: just work with it directly.
Numlist zeroes, ones;
for (int k = 1; k < 32; k++) {
// Build this step's masking value, which we can use to
// find the value of individual bits by using bitwise AND.
// Also note that we _know_ this is a safe integer operation.
mask = (int) Math.pow(2,k);
// just allocate new lists; it's about as fast as clearing.
zeroes = new Numlist();
ones = new Numlist();
// "scatter": now we empty the numbers list one element at a
// time, and then we'll fill it back up after we emptied it.
while (!numbers.isEmpty()) {
int e = numbers.remove(0);
if ((e & mask) == mask) {
ones.add(e);
} else {
zeroes.add(e);
}
}
// "gather"
for (int i: zeroes) { numbers.add(i); }
for (int i: ones) { numbers.add(i); }
}
}
And with that rewrite, things work. We removed a lot of verbosity which makes it easier to reason about what the code does, and we removed the whole "int to string to substring, then char comparison" which means there's a lot less to go wrong.
In terms of testing, add your function into the following class:
import java.lang.Math;
import java.util.ArrayList;
public class Test {
// private class goes here
public static void main(String[] argv) { new Test(); }
public Test() {
Numlist list = new Numlist();
list.add(10);
list.add(77810);
list.add(4);
list.add(100);
list.add(1);
list.add(290387423);
list.add(23423);
sort(list);
System.out.println(list);
}
// function goes here
}
And done: javac
will happily compile it, and executing should yield [1, 4, 10, 100, 23423, 77810, 290387423]
You'll also note we're not using for (int k = 31; k > 0; k--)
but we're using for (int k=1; k<32; k++)
... why? Does it make a difference?
it makes a huge difference
By running our mask from b000...001 to b100...000 we guarantee that despite "picking values back out", their relative "less than the current bit" ordering is preserved.
If we run our masking the other way, from b1000...000 to b000...001, then at every step we're undoing the ordering we just set up, and the result is not a sorted list at all: [1, 4, 100, 77810, 10, 290387423, 23423]
** edit **: why does masking work?
The integer types byte
, char
, int
and long
are just bit patterns of 1, 2, 4, and 8 bytes, respectively, so all of these are already "binary numbers", we just need a way to access individual bits, which we can do using bitwise masking.
In order to mask off "everything but a specific bit", you can take the bitwise AND of some bit pattern, and a pattern in which only a single bit is set, which we can create really easily since these are just numbers that are "powers of 2".
For instance, to check the bits in the number 23, we can use the following checks:
23 & 1 2 4 8 16 32
b0 1 & 1=1 0=0 0=0 0=0 0=0 0=0
b1 1 & 0=0 1=1 0=0 0=0 0=0 0=0
b2 1 & 0=0 0=0 1=1 0=0 0=0 0=0
b3 0 & 0=0 0=0 0=0 1=0 0=0 0=0
b4 1 & 0=0 0=0 0=0 0=0 1=1 0=0
b5 0 & 0=0 0=0 0=0 0=0 0=0 1=0
Here we can see the number 23, binary 10111
, and the result of masking by each power of two: 23 & 1 yields 1, so we know the first bit is set. We see that 23 & 2 yields 2, so we know the second bit is set. The same for 4, but 23 & 8 yields 0. We know that the fourth bit is not set.
So we can check any length bit pattern by using bitwise AND: if (number & mask) == mask
we know that the bit for that mask is set. If the result is 0 instead, we know the bit was not set.
Also, note that &
is not the same as &&
: &
is a bitwise AND operation, performing an AND for every single bit between the left and right hand side of the operator. The &&
operator is logical AND, requiring the left and right sides to boolean values. The logic AND is effectively "a single AND", whereas the bitwise AND is "as many AND operations as there are bits to test".