Search code examples
pythonmallocctypeslibclibpq

How to force/test malloc failure in shared library when called via Python ctypes


I have a Python program that calls a shared library (libpq in this case) that itself calls malloc under the hood.

I want to be able to test (i.e. in unit tests) what happens when those calls to malloc fail (e.g. when there isn't enough memory).

How can I force that?

Note: I don't think setting a resource limit on the process using ulimit -d would work. It would need to be be precise and robust enough to, say, make a single malloc call inside libpq, for example one inside PQconnectdbParams, to fail, but all others to work fine, across different versions of Python, and even different resource usages in the same version of Python.


Solution

  • It's possible, but it's tricky. In summary

    • You can override malloc in a shared library, test_malloc_override.so say, and then (on linux at least) using the LD_PRELOAD environment variable to load it.

    • But... Python calls malloc all over the place, and you need those to succeed. To isolate the "right" calls to malloc to fail you can use the glibc functions "backtrace" and "backtrace_symbols" to inspect the stack to see if it's the right one to fail.

    • This shared library exposes a small API to control which calls to malloc will fail (so it doesn't need to be hard coded in the library)

    • To allow some calls to malloc to succeed, you need a pointer to the original malloc function. However, to find this you need to call dlsym, which itself can call malloc. So you need to build in a simple allocator inside the new malloc so these calls (recursive) calls to malloc succeed. Thanks to https://stackoverflow.com/a/10008252/1319998 for this tip.

    In more detail:

    The shared library code

    // In test_override_malloc.c
    // Some of this code is inspired by https://stackoverflow.com/a/10008252/1319998
    
    #define _GNU_SOURCE
    #include <dlfcn.h>
    #include <execinfo.h>
    #include <stddef.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    
    // Fails malloc at the fail_in-th call when search_string is in the backtrade
    // -1 means never fail
    static int fail_in = -1;
    static char search_string[1024];
    
    // To find the original address of malloc during malloc, we might
    // dlsym will be called which might allocate memory via malloc
    static char initialising_buffer[10240];
    static int initialising_buffer_pos = 0;
    
    // The pointers to original memory management functions to call
    // when we don't want to fail
    static void *(*original_malloc)(size_t) = NULL;
    static void (*original_free)(void *ptr) = NULL;
    
    void set_fail_in(int _fail_in, char *_search_string) {
        fail_in = _fail_in;
        strncpy(search_string, _search_string, sizeof(search_string));
    }
    
    void *
    malloc(size_t size) {
        void *memory = NULL;
        int trace_size = 100;
        void *stack[trace_size];
    
        static int initialising = 0;
        static int level = 0;
    
        // Save original
        if (!original_malloc) {
            if (initialising) {
                if (size + initialising_buffer_pos >= sizeof(initialising_buffer)) {
                    exit(1);
                }
                void *ptr = initialising_buffer + initialising_buffer_pos;
                initialising_buffer_pos += size;
                return ptr;
            }
    
            initialising = 1;
            original_malloc = dlsym(RTLD_NEXT, "malloc");
            original_free = dlsym(RTLD_NEXT, "free");
            initialising = 0;
        }
    
        // If we're in a nested malloc call (the backtrace functions below can call malloc)
        // then call the original malloc
        if (level) {
            return original_malloc(size);
        }
        ++level;
    
        if (fail_in == -1) {
            memory = original_malloc(size);
        } else {
             // Find if we're in the stack
            backtrace(stack, trace_size);
            char **symbols = backtrace_symbols(stack, trace_size);
            int found = 0;
            for (int i = 0; i < trace_size; ++i) {
                if (strstr(symbols[i], search_string) != NULL) {
                    found = 1;
                    break;
                }
            }
            free(symbols);
    
            if (!found) {
                memory = original_malloc(size);
            } else {
                if (fail_in > 0) {
                    memory = original_malloc(size);
                }
                --fail_in;
            }
        }
    
        --level;
        return memory;
    }
    
    void free(void *ptr) {
        if (ptr < (void*) initialising_buffer || ptr > (void*)(initialising_buffer + sizeof(initialising_buffer))) {
            original_free(ptr);
        }
    }
    
    

    Compiled with

    gcc -shared -fPIC test_override_malloc.c -o test_override_malloc.so -ldl
    

    Example Python code

    This could go inside the unit tests

    # Inside my_test.py
    
    from ctypes import cdll
    cdll.LoadLibrary('./test_override_malloc.so').set_fail_in(0, b'libpq.so')
    
    # ... then call a function in the shared library libpq.so
    # The `0` above means the very next call it makes to malloc will fail
    

    Run with

    LD_PRELOAD=$PWD/test_override_malloc.so python3 my_test.py
    

    (This might all not be worth it admittedly... if Python calls malloc a lot, I wonder if that in most situations it's unlikely that Python will be fine but just the one call in the library will fail)