I am debugging a transactional processing system which is performance sensitive.
I found a code which uses, __builtin_memcpy and __builtin_memset instead of memcpy and memset.
What are __builtin_functions for? ,to prevent the dependency problems on architecture or compiler?
Or.. is there any performance reason where __builtin_functions are prefered?
thank you :D
Traditional library functions, the standard memcpy
is just a call to a function. Unfortunately, memcpy
is often called for every small copies, and the overhead of calling a function, shuffling a few bytes and returning is quite a lot of overhead (especially since memcpy
adds extra stuff to the beginning of the function to deal with unaligned memory, unrolling of the loop, etc, to do well on LARGE copies).
So, for the compiler to optimise those, it needs to "know" how to do for example memcpy
- the solution for this is to have a function "builtin" into the compiler, which then contains code such as this:
int generate_builtin_memcpy(expr arg1, expr arg2, expr size)
{
if (is_constant(size) && eval(size) < SOME_NUMBER)
{
... do magic inline memory copy ...
}
else
{
... call "real" memcpy ...
}
}
[For retargetable compilers, there is typically one of these functions for each CPU architecture, that has different configurations as to what conditions the "real" memcpy
gets called, or when an inline memcpy is used.]
The key here is that you MAY actually write your own memcpy
function, that ISN'T based on __builtin_memcpy()
, which is ALWAYS a function, and doesn't do the same thing as normal memcpy
[you'd be a bit in trouble if you change it's behaviour a lot, since the C standard library probably calls memcpy
in a few thousand places - but for example doing statistics over how many times memcpy
is called, and what sizes are copies could be one such use-case].
Another big reason for using __builtin_*
is that they provide code that would otherwise have to be written in inline assembler, or possibly not available at all to the programmer. Setting/getting special registers would be such a thing.
There are other techniques to solve this problem, for example clang
has a LibraryPass
that assumes library-calls do common functions with other alternatives, for example since printf
is much "heavier" than puts
, it replaces suitable printf("constant string with no formatting\n")
s into puts("constant string with no formatting")
, and many trigonometric and other math functions are resolved into common simple values when called with constants, etc.
Calling __builtin_*
directly for functions like memcpy
or sin
or some such is probably the WRONG thing to do - it just makes your code less portable and not at all certain to be faster. Calling __builtin_special_function
when there is no other is typically the solution in some tricky situations - but you should probably wrap it in your own function, e.g.
int get_magic_property()
{
return __builtin_get_magic_property();
}
That way, when you port to Windows, you can easily do:
int get_magic_property()
{
#if WIN32
return Win32GetMagicPropertyEx();
#else
return __builtin_magic_property();
#endif
}