I have the following code:
#include <stdint.h>
#include <stdio.h>
#include <x86intrin.h>
long long lzcnt(long long l)
{
return __lzcnt64(l);
}
int main(int argc, char** argv)
{
printf("%lld\n", lzcnt(atoll(argv[1])));
return 0;
}
Running with different compilers and options I get (assembly shown):
Clang
$ clang -Wall src/test.c -D__LZCNT__ && ./a.out 2047
53
0000000000400560 <lzcnt>:
400560: 55 push %rbp
400561: 48 89 e5 mov %rsp,%rbp
400564: 48 89 7d f0 mov %rdi,-0x10(%rbp)
400568: 48 8b 7d f0 mov -0x10(%rbp),%rdi
40056c: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400570: 48 8b 7d f8 mov -0x8(%rbp),%rdi
400574: 48 0f bd ff bsr %rdi,%rdi
400578: 48 83 f7 3f xor $0x3f,%rdi
40057c: 89 f8 mov %edi,%eax
40057e: 48 63 c0 movslq %eax,%rax
400581: 5d pop %rbp
400582: c3 retq
400583: 66 66 66 66 2e 0f 1f data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
40058a: 84 00 00 00 00 00
GCC without -mlzcnt
$ gcc -Wall src/test.c -D__LZCNT__ && ./a.out 2047
53
0000000000400580 <lzcnt>:
400580: 55 push %rbp
400581: 48 89 e5 mov %rsp,%rbp
400584: 48 89 7d e8 mov %rdi,-0x18(%rbp)
400588: 48 8b 45 e8 mov -0x18(%rbp),%rax
40058c: 48 89 45 f8 mov %rax,-0x8(%rbp)
400590: 48 0f bd 45 f8 bsr -0x8(%rbp),%rax
400595: 48 83 f0 3f xor $0x3f,%rax
400599: 48 98 cltq
40059b: 5d pop %rbp
40059c: c3 retq
GCC with -mlzcnt
$ gcc -Wall src/test.c -D__LZCNT__ -mlzcnt && ./a.out 2047
10
0000000000400580 <lzcnt>:
400580: 55 push %rbp
400581: 48 89 e5 mov %rsp,%rbp
400584: 48 89 7d e8 mov %rdi,-0x18(%rbp)
400588: 48 8b 45 e8 mov -0x18(%rbp),%rax
40058c: 48 89 45 f8 mov %rax,-0x8(%rbp)
400590: f3 48 0f bd 45 f8 lzcnt -0x8(%rbp),%rax
400596: 48 98 cltq
400598: 5d pop %rbp
400599: c3 retq
G++ without -mlzcnt
$ g++ -Wall src/test.c -D__LZCNT__ && ./a.out 2047
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.8.2/include/immintrin.h:64:0,
from /usr/lib/gcc/x86_64-redhat-linux/4.8.2/include/x86intrin.h:62,
from src/test.c:3:
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include/lzcntintrin.h: In function ‘short unsigned int __lzcnt16(short unsigned int)’:
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include/lzcntintrin.h:38:29: error: ‘__builtin_clzs’ was not declared in this scope
return __builtin_clzs (__X);
G++ with -mlzcnt
$ g++ -Wall src/test.c -D__LZCNT__ -mlzcnt && ./a.out 2047
10
0000000000400640 <_Z5lzcntx>:
400640: 55 push %rbp
400641: 48 89 e5 mov %rsp,%rbp
400644: 48 89 7d e8 mov %rdi,-0x18(%rbp)
400648: 48 8b 45 e8 mov -0x18(%rbp),%rax
40064c: 48 89 45 f8 mov %rax,-0x8(%rbp)
400650: f3 48 0f bd 45 f8 lzcnt -0x8(%rbp),%rax
400656: 48 98 cltq
400658: 5d pop %rbp
400659: c3 retq
The difference is quite clearly the use of -mlzcnt, however I'm actually working in C++ and without that option it doesn't compile on g++ (clang++ is fine). It looks like when -mlzcnt is used then the result is 63-(result without -mlzct). Is there any documentation on the -mlzcnt option for gcc (I looked through the info files, but couldn't find anything)? Does it do anything more that opt for the lzcnt instruction?
First off, I'm able to perfectly replicate your problem with both clang 3.3 and gcc 4.8.1.
Here's my thoughts... I'm only about 50% on this.
Let's look at my system (which is a Xeon X3430, Lynnfield, Nehalem).
[4:48pm][wlynch@apple /tmp] sudo cpuid -1ir | grep 80000001
0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000001 edx=0x28100800
So, bit 23 of ECX is not true. So my system doesn't support LZCNT.
It also looks like it just happens that my machine interprets the unsupported LZCNT as a BSR.