How can I create a __m128i
having the n
most significant bits set (in the entire vector)? I need this to mask portions of a buffer that are relevant for a computation. If possible, the solution should have no branches, but this seems hard to achieve
How can I do this ?
You can use one of the methods from this question to generate a mask with the MS n bytes set to all ones. You would then just need to fix up any remaining bits when n is not a multiple of 8.
I suggest trying something like this:
- init vector A = all (8 bit) elements to the residual mask of n % 8 bits
- init vector B = mask of n / 8 bytes using one of the above-mentioned methods
- init vector C = mask of (n + 7) / 8 bytes using one of the above-mentioned methods
- result = A | B & C
So for example if n = 36:
A = f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0
B = ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00
C = ff ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00
==> ff ff ff ff f0 00 00 00 00 00 00 00 00 00 00 00
This would be branchless, as required, but it's probably of the order of ~10 instructions. There may be a more efficient method but I would need to give this some more thought.