Search code examples
cperformancegccglibcstrcpy

Different performance between glibc and same code


I am developing a program that copy string. And I checked the performance to compare with glibc. I downloaded source for glibc with this command:

apt-get source glibc

I compare with following code.

  1. /glibc-2.19/string/strcpy.c
  2. #include<string> and use strcpy()

It must be similar performance, I expected... However, as a result, the performance were totally different.

I tried some type of optimize option for gcc such as O1 O2 O3 but the result is similar.

Is there some kind of magic to get more speed? I hope to know the reason.

Here is the code

// test for performance.

/******************************************************************************/

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <stddef.h>


/******************************************************************************/
char *
strcpy_glibc (dest, src)
     char *dest;
     const char *src;
{
  char c;
  char *s = (char *) src;
  const ptrdiff_t off = dest - s - 1;

  do
    {
      c = *s++;
      s[off] = c;
    }
  while (c != '\0');

  return dest;
}

/******************************************************************************/
void test(int iLoop, int iLen,
    char *szFuncName, char*(*func)(char *s1, const char *s2))
{
    time_t          tm1, tm2;
    int             i;
    char   s1[512];
    char   s2[512];

    // initialize the test string.
    for(i = 0; i < iLen; i++) {
        s1[i] = '@';
    }
    s1[iLen] = '\0';

    /**************************************************************************/
    printf("test(): %s() started, iLoop = %d, iLen = %d.\n",
        szFuncName, iLoop, iLen);

    tm1 = time(NULL);

    for(i = 0; i < iLoop; i++) {
        func(s2, s1);
        func(s1, s2);
        func(s2, s1);
        func(s1, s2);
        func(s2, s1);

        func(s1, s2);
        func(s2, s1);
        func(s1, s2);
        func(s2, s1);
        func(s1, s2);
    }

    tm2 = time(NULL);

    printf("test(): %s() terminated in %d [sec].\n", szFuncName, (int)(tm2 - tm1));
    printf("test(): %s() answer s1[0] = %c.\n", szFuncName, s1[0]);
}

/******************************************************************************/
int main(int argc, char *argv[])
{
    printf("main(): Started.\n");

    test(100000000, 511, "strcpy_glibc", strcpy_glibc);
    test(100000000, 511, "strcpy", strcpy);
    test(100000000, 511, "strcpy_glibc", strcpy_glibc);
    test(100000000, 511, "strcpy", strcpy);

    printf("main(): Terminated.\n");
    return 0;
}

/******************************************************************************/
/* EOF */

And that result is here...

************************$ ./strcpy_test_3
main(): Started.
test(): strcpy_glibc() started, iLoop = 100000000, iLen = 511.
test(): strcpy_glibc() terminated in 238 [sec].
test(): strcpy_glibc() answer s1[0] = @.
test(): strcpy() started, iLoop = 100000000, iLen = 511.
test(): strcpy() terminated in 56 [sec].
test(): strcpy() answer s1[0] = @.
test(): strcpy_glibc() started, iLoop = 100000000, iLen = 511.
test(): strcpy_glibc() terminated in 238 [sec].
test(): strcpy_glibc() answer s1[0] = @.
test(): strcpy() started, iLoop = 100000000, iLen = 511.
test(): strcpy() terminated in 55 [sec].
test(): strcpy() answer s1[0] = @.
main(): Terminated.
************************$

Well, this means that strcpy() is faster 4 times than strcpy_glibc() but there code are same.

I'm very confused...


Solution

  • You can't directly copy libc code into application and expect for better performance because libc and OS has lot of specific code and internal knowledge, so its expected that the performance difference.

    Try this:

    static __inline__ __attribute__((always_inline))
    char * strcpy_glibc(char * __restrict to, const char * __restrict from)
    {
        char *save = to;
    
        for (; (*to = *from); ++from, ++to);
        return(save);
    }
    

    Instead of function pointer try to inline function in your application, if not to frequent calls. For sure will get much better performance but this code doesn't handle the corner cases and checks.