I'm trying to write a SWAR compare-for-equality operation, working on uint64_t
pretending to be 8 'lanes' of uint8_t
. The closest I've managed to achieve, based on techniques in Hacker's Delight and Bit Twiddling Hacks, is the following:
uint64_t compare_eq (uint64_t x, uint64_t y) {
uint64_t xored = x ^ y;
uint64_t mask = 0x7F * 0x0101010101010101ULL;
uint64_t tmp = (xored & mask) + mask;
return ~(tmp | xored | mask);
}
However, this puts 0x80
into 'lanes' which match, and 0x00
into 'lanes' that don't, whereas I want 0xFF
in 'lanes' that match, and 0x00
in 'lanes' that don't. Is it possible to write this without branching?
For the record, this is just a variant for calculating a high bit in nonzero bytes (one instruction less) put together with the comments from @njuffa and @Nate Eldredge (probably slightly more efficient than in 4386427's answer).
uint64_t compare_eq (uint64_t x, uint64_t y) {
uint64_t xored = x ^ y;
uint64_t mask = ((((xored >> 1) | 0x8080808080808080) - xored) & 0x8080808080808080);
return (mask << 1) - (mask >> 7);
}