I wonder, how to efficiently compute the hashCode
for a BitSet
-like implementation of Set<Integer>
.
The BitSet#hashCode
is obviously fast to compute, rather stupid(*) and incompatible with Set#hashCode()
.
A fast compatible implementation could go like
int hashCode() {
int result = 0;
for (int i=0; i<bits.length; ++i) {
long word = bits[i];
result += 64 * i * Long.bitCount(word) + weightedBitCount(word);
}
return result;
}
if there was an efficient implementation of
int weightedBitCount(long word) { // naive implementation
int result = 0;
for (int i=0; i<64; ++i) {
if ((word & (1L << i)) != 0) {
result += i;
}
}
return result;
}
In case most bits are unset, the naive implementation could be improved by testing word==0
or using Long.highestOneBit
or alike, but these tricks don't help much and are detrimental in other cases.
Is there a clever trick to significantly speed it up in general?
I guess, some batch computation over multiple words could be more efficient. Computing Set#size
at the same time would be a nice bonus.
A note concerning premature optimization: Yes, I know. I'm mostly curious (and it can be useful for things like Project Euler).
(*) There are many bits which gets completely ignored (they get shifted out in the multiplication).
This is what I did:
int weightedBitCount(long word) {
return (Long.bitCount(word & 0xFFFF_FFFF_0000_0000L) << 5)
+ (Long.bitCount(word & 0xFFFF_0000_FFFF_0000L) << 4)
+ (Long.bitCount(word & 0xFF00_FF00_FF00_FF00L) << 3)
+ (Long.bitCount(word & 0xF0F0_F0F0_F0F0_F0F0L) << 2)
+ (Long.bitCount(word & 0xCCCC_CCCC_CCCC_CCCCL) << 1)
+ (Long.bitCount(word & 0xAAAA_AAAA_AAAA_AAAAL) << 0);
}
It's pretty simple: With a single bit set, e.g., the bit 10, word
looks like 0x0000_0000_0000_0400L
and only the masks 0xFF00_FF00_FF00_FF00L
and 0xCCCC_CCCC_CCCC_CCCCL
produce a bit count of 1, so we get
(0 << 5) + (0 << 4) + (1 << 3) + (0 << 2) + (1 << 1) + (0 << 5) = 10
It needs some 6*4 instructions (maybe 6 cycles on modern Intel) per 64 bits, so it's not really slow, but it's still too slow when compared with the bulk bitset operations which need a single instruction (per 64 bits).
So I'm playing with some batch computation over multiple words.