Search code examples
javaoptimizationbit-manipulationhammingweight

Optimizing Long.bitCount


I have a program that is making a huge number of calls to Long.bitCount(), so many that it is taking 33% of cycles on one CPU core. Is there a way to implement it that is faster than the Sun JDK version?

I have tried:

  • This algorithm (I think this is exactly how the JDK implements it)
  • lookup tables of various sizes between 28 and 222 (looking at a few bits at a time and adding the results)

But I couldn't do any better than a 216-entry lookup table with a manually-unrolled loop (about 27% CPU.)
How else might this be optimized for Java?


Note: this question is about Java-specific optimization, but this similar (language-agnostic) question has many other algorithms.


Solution

  • If you are on a recent x86 CPU there is an instruction for this, popcnt.

    In recent versions of Java, Long.bitCount() uses this instruction. Just use -XX:+UsePopCountInstruction (this is the default in recent versions)

    However, there are some bugs with it in JRE 6.0_u18 through 7.0_u5: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7063674