Search code examples
clinuxmacoscompilationmalloc

Weird OS-Dependent behaviour on custom malloc implementation


I'm taking a course in which we have a custom implementation of malloc, taken from this article, the implementation for the malloc can be seen here, in addition, we also have this program that tests the implementation itself (it just does a lot of memory allocations)

#include <stdlib.h>

#define MAXITERS   100000
#define MAXSIZE    1024

int main(int argc, char* argv[]) {
  float pfree = atof(argv[1]);
  int niter = rand() % MAXITERS;
  for(int i = 0; i < niter; i++ ) {
     int size = rand() % MAXSIZE;
     void* p = malloc(size * sizeof(char));
     float toss = (float)random() / RAND_MAX;
     if ( toss > pfree )
        free(p);
  }
  return 0;
}

The problem is, whenever I ran it, it would take around 30 seconds on WSL2. My friend's computer, with a M3 Pro, took around ~1 second, and so did my professor's M2 Air. Naturally, I thought this was a problem with WSL, so I quickly booted into Ubuntu 23, ran the program, and found out it takes the same time as WSl2. I then decided to run it in Fedora 39, and got the same result.

Here are the compilation commands we're using, which I think might be the reason something is being weird.

gcc -Wno-deprecated-declarations -o libmemalloc.so -fPIC -shared memalloc.c
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
gcc testalloc.c -o testalloc -L. -lmemalloc
./testalloc 0.5

The original instructions didn't come with the "export" keyword for the LD_LIBRARY_PATH since my professor works on MacOS and it seems to work there, however, on Linux, they don't work without it. Could this be the problem? Is the export functioning differently than the MacOS version?

I'll upload the .zip file with the code we're running to this repo, as well as a script that runs it, which is what I'm using for testing. It would be greatly appreciated if you guys could test the code and its speed, as well as help me on how to solve this problem.

EDIT We confirmed it was indeed using our actual implementation on the MacOS, as we added:

int x = x/0;
printf("%d", x);

To the code, and on Linux it crashed, but weirdly, on MacOS it just prints 0, I don't know if that is normal behavior, but either way, it's definitely using our implementation, as it either crashes, or prints 0.

[What I have tried]

Tried other computers and OS's, all the same (Except Windows for obvious reasons)

Tried different compilation commands, including the ones from the original article, which use LD_PRELOAD, but they continue to be very slow as well as breaking other programs on the terminal session, whereas LD_LIBRARY_PATH doesn't seem to do so.

Adding code that breaks our malloc implementation (See edit above) This is further proved by the fact that the MacOS running our version took around 1~ second, but even a 10-year-old laptop, when running the actual malloc() does it in 0.0s (using the time command), so MacOs was definitely running our version of malloc().


Solution

  • The runtime of your program is affected by the random numbers you get:

    1. Value for niter (number of iterations!!!)
    2. Value for size on each loop.
    3. Value for toss (whether you do a free).

    You're mixing rand and random calls [why?].

    On some systems (e.g. linux/glibc), rand calls random but others not.

    Also, on linux/glibc, rand_r is a simple linear congruential generator and rand is not rand_r(&seed)

    See: Linear Congruential Generator

    Therefore, to get consistent results, you are at the mercy of the given system's implementation of rand and random!


    Try replacing all rand* calls with: (e.g.) xrand. This is an LCG based on the "standard" numbers:

    static unsigned int xrand_seed;
    
    int
    xrand_r(unsigned int *seed)
    {
        int val = *seed;
        val = (val * 1103515245) + 12345;
        *seed = val;
        return val & 0x7FFFFFFF;
    }
    
    int
    xrand(void)
    {
        return xrand_r(&xrand_seed);
    }
    

    Here is a refactored version of testalloc.c:

    1. It uses the xrand implementation from above
    2. It has debug printf and timestamping so you can see how performance is affected.
    3. It allows for a -R<seed> argument to see how the random seed affects performance.
    4. The segfault from invoking the program without an argument has been fixed.
    /*
     * a simple application to exercise malloc() and free() calls
     */
    
    #include <stdlib.h>
    #include <stdio.h>
    #include <time.h>
    
    #define MAXITERS   100000
    #define MAXSIZE    1024
    
    #if DEBUG
    #define dbgprt(_fmt...) \
        do { \
            dbgstamp(); \
            printf(_fmt); \
        } while (0)
    #else
    #define dbgprt(_fmt...) \
        do { } while (0)
    #endif
    
    double tsczero;
    
    double
    tscgetf(void)
    {
        struct timespec ts;
        double sec;
    
        clock_gettime(CLOCK_MONOTONIC,&ts);
        sec = ts.tv_nsec;
        sec /= 1e9;
        sec += ts.tv_sec;
    
        sec -= tsczero;
    
        return sec;
    }
    
    void
    dbgstamp(void)
    {
        double tsc = tscgetf();
    
        static unsigned long long seqno;
        printf("[%llu/%.9f] ",seqno++,tsc);
    }
    
    static unsigned int xrand_seed;
    
    int
    xrand_r(unsigned int *seed)
    {
        int val = *seed;
        val = (val * 1103515245) + 12345;
        *seed = val;
        return val & 0x7FFFFFFF;
    }
    
    int
    xrand(void)
    {
        return xrand_r(&xrand_seed);
    }
    
    int
    main(int argc, char *argv[])
    {
    
        --argc;
        ++argv;
    
        tsczero = tscgetf();
    
        for (;  argc > 0;  --argc, ++argv) {
            char *cp = *argv;
            if (*cp != '-')
                break;
    
            cp += 2;
            switch (cp[-1]) {
            case 'R':
                xrand_seed = atoll(cp);
                break;
            }
        }
    
        dbgprt("main: R=%u\n",xrand_seed);
    
        double pfree;
        if (argc > 1)
            pfree = atof(argv[1]);
        else
            pfree = 0.5;
        dbgprt("main: pfree=%g\n",pfree);
    
        int niter = xrand() % MAXITERS;
        dbgprt("main: niter=%d\n",niter);
    
        for (int i = 0; i < niter; i++) {
            dbgprt("main: ENTER i=%d/%d\n",i,niter);
    
            int size = xrand() % MAXSIZE;
            dbgprt("main: RAND size=%d\n",size);
    
            void *p = malloc(size * sizeof(char));
            dbgprt("main: RET p=%p\n",p);
    
            double toss = (double) xrand() / RAND_MAX;
            if (toss > pfree) {
                dbgprt("main: FREE\n");
                free(p);
            }
    
            dbgprt("main: EXIT\n");
        }
    
        printf("Successful\n");
    
        return 0;
    }