I'm currently developing a Java-based library for network coding (http://en.wikipedia.org/wiki/Network_coding). This is very CPU-intensive and therefore need some help optimizing the encoding stage. What I'm essentially doing is that I'm creating random-linear combinations of the original data where addition is XOR and multiplication is a Galois-field multiplication (in GF(2^16)).
I've come as far as I'm capable with the optimizations. For instance I'm using tricks like this: http://groups.google.com/group/comp.dsp/browse_thread/thread/cba57ae9db9971fd/7cd21eec39ddae1a?hl=en&lnk=gst&q=Sarwate+Galois#7cd21eec39ddae1a to make the multiplications faster.
I'm therefore looking for tips on how to optimize this further. It's hard to profile since the profilers I've used doesn't give you any hints on which operation is the most expensive (e.g. is it the array-lookup or the XOR). So I'm at the point where I'm sort of randomly trying out different ideas and test if it improves the overall performance.
More specifically some potential areas of improvement that I need help on are:
Here's the core of the algorithm. It might be hard to understand out of context but if you see any unnecessarily expensive operations I'm doing then please let me know!
int messageFragmentStart = 0;
int messageFragmentEnd = fragmentCharSize;
int coefficientIndex = fragmentID * messageFragmentsPerDataBlock;
final int resultArrayIndexStart = fragmentID * fragmentCharSize;
for (int messageFragmentIndex = 0; messageFragmentIndex < messageFragmentsPerDataBlock; messageFragmentIndex++) {
final int coefficientLogValue = coefficientLogValues[coefficientIndex++];
int resultArrayIndex = resultArrayIndexStart;
for (int i = messageFragmentStart; i < messageFragmentEnd; i++) {
final int logSum = coefficientLogValue + logOfDataToEncode[i];
final int messageMultipliedByCoefficient = expTable[logSum];
resultArray[resultArrayIndex++] ^= messageMultipliedByCoefficient;
messageFragmentStart += fragmentCharSize;
messageFragmentEnd = Math.min(messageFragmentEnd + fragmentCharSize, maxTotalLength);
You can't make Java forgo the bounds checking as its specified in the JLS. But in most cases the JIT is able to avoid this as long as the bounds check is simple (eg i < array.length
) - if not, there's no way to avoid it (well I assume one could play with unsafe objects?).
For your second problem there's this here which should fulfill the purposes just fine.
But anyhow from your code it seems like this problem is trivial to vectorize and sadly the JVM isn't very good at it/does it at all. Hence implementing the same code in c/c++ using compile intrinsics (you could even try the auto vectorization of ICC/GCC) could lead to some quite noticeable speedups - assuming we're not completely memory bound. So I'd implement it in C++ and use JNI just for reference.