My question is about what the compiler is doing in this case that optimizes the code way more than what I would think is possible.
Given this enum:
enum MyEnum {
Entry1,
Entry2,
... // Entry3..27 are the same, omitted for size.
Entry28,
Entry29
};
And this function:
bool MyFunction(MyEnum e)
{
if (
e == MyEnum::Entry1 ||
e == MyEnum::Entry3 ||
e == MyEnum::Entry8 ||
e == MyEnum::Entry14 ||
e == MyEnum::Entry15 ||
e == MyEnum::Entry18 ||
e == MyEnum::Entry21 ||
e == MyEnum::Entry22 ||
e == MyEnum::Entry25)
{
return true;
}
return false;
}
For the function, MSVC generates this assembly when compiled with -Ox optimization flag (Godbolt):
bool MyFunction(MyEnum) PROC ; MyFunction
cmp ecx, 24
ja SHORT $LN5@MyFunction
mov eax, 20078725 ; 01326085H
bt eax, ecx
jae SHORT $LN5@MyFunction
mov al, 1
ret 0
$LN5@MyFunction:
xor al, al
ret 0
Clang generates similar (slightly better, one less jump) assembly when compiled with -O3 flag:
MyFunction(MyEnum): # @MyFunction(MyEnum)
cmp edi, 24
ja .LBB0_2
mov eax, 20078725
mov ecx, edi
shr eax, cl
and al, 1
ret
.LBB0_2:
xor eax, eax
ret
What is happening here? I see that even if I add more enum comparisons to the function, the assembly that is generated does not actually become "more", it's only this magic number (20078725) that changes. That number depends on how many enum comparisons are happening in the function. I do not understand what is happening here.
The reason why I am looking at this is that I was wondering if it is good to write the function as above, or alternatively like this, with bitwise comparisons:
bool MyFunction2(MyEnum e)
{
if (
e == MyEnum::Entry1 |
e == MyEnum::Entry3 |
e == MyEnum::Entry8 |
e == MyEnum::Entry14 |
e == MyEnum::Entry15 |
e == MyEnum::Entry18 |
e == MyEnum::Entry21 |
e == MyEnum::Entry22 |
e == MyEnum::Entry25)
{
return true;
}
return false;
}
This results in this generated assembly with MSVC:
bool MyFunction2(MyEnum) PROC ; MyFunction2
xor edx, edx
mov r9d, 1
cmp ecx, 24
mov eax, edx
mov r8d, edx
sete r8b
cmp ecx, 21
sete al
or r8d, eax
mov eax, edx
cmp ecx, 20
cmove r8d, r9d
cmp ecx, 17
sete al
or r8d, eax
mov eax, edx
cmp ecx, 14
cmove r8d, r9d
cmp ecx, 13
sete al
or r8d, eax
cmp ecx, 7
cmove r8d, r9d
cmp ecx, 2
sete dl
or r8d, edx
test ecx, ecx
cmove r8d, r9d
test r8d, r8d
setne al
ret 0
Since I do not understand what happens in the first case, I can not really judge which one is more efficient in my case.
Quite smart! The first comparison with 24 is to do a rough range check - if it's more than 24 or less than 0 it will bail out; this is important as the instructions that follow that operate on the magic number have a hard cap to [0, 31] for operand range.
For the rest, the magic number is just a bitmask, with the bits corresponding to the "good" values set.
>>> bin(20078725)
'0b1001100100110000010000101'
It's easy to spot the first and third bits (counting from 1 and from right) set, the 8th, 14th, 15th, ...
MSVC checks it "directly" using the BT (bit test) instruction and branching, clang instead shifts it of the appropriate amount (to get the relevant bit in the lowest order position) and keeps just it ANDing it with zero (avoiding a branch).
The C code corresponding to the clang version would be something like:
bool MyFunction(MyEnum e) {
if(unsigned(e) > 24) return false;
return (20078725 >> e) & 1;
}
as for the MSVC version, it's more like
inline bool bit_test(unsigned val, int bit) {
return val & (1<<bit);
}
bool MyFunction(MyEnum e) {
if(unsigned(e) > 24) return false;
return bit_test(20078725, e);
}
(I kept the bit_test
function separated to emphasize that it's actually a single instruction in assembly, that val & (1<<bit)
thing has no correspondence to the original assembly.
As for the if
-based code, it's quite bad - it uses a lot of CMOV and ORs the results together, which is both longer code, and will probably serialize execution. I suspect the corresponding clang code will be better. OTOH, you wrote this code using bitwise OR (|
) instead of the more semantically correct logical OR (||
), and the compiler is strictly following your orders (typical of MSVC).
Another possibility to try instead could be a switch
- but I don't think there's much to gain compared to the code already generated for the first snippet, which looks pretty good to me.
Ok, doing a quick test with all the versions against all compilers, we can see that:
switch
case;switch
results are varied: