Protecting talloced memory shared by multiple threads against writes

In our application (a network daemon), there are roughly three uses of heap allocated memory.

Memory allocated on startup to hold the result of parsing the application's global configuration.
Memory allocated for thread-specific data as threads are created (and freed as they're destroyed).
Memory allocated when servicing requests and bound to the lifetime of the request.

We use talloc to manage memory in all three cases.

We've recently run into some memory corruption issues where bad pointer values have meant one or more of the threads are writing to the global configuration and causing crashes.

Because of the way the application is structured, nothing should ever write to the memory allocated in case 1) after the application starts processing requests.

Is there a way of marking the memory allocated in case 1) as read-only?

Solution

In the POSIX specification there's a function, mprotect. mprotect allows the permissions (read/write/execute) on individual pages of memory to be changed.

The problem with using mprotect to mark up parts of the heap as read-only, is the fact that the highest granularity is a single page, which is usually 4k (dependent on OS/architecture). Padding all heap allocated structures to a multiple of 4k would cause massive memory bloat, boo.

So in order to use mprotect for case 1) we need to get it all the data we want to protect in one contiguous area of memory.

Talloc can help out here. talloc pools are a type of slab allocation that can give large performance gains when used correctly, and (if of sufficient size), allow all allocations within the pool to be done in a single contiguous memory area.

Great! Problem solved, allocate a talloc memory pool, do all the instantiation and parsing work, use mprotect to mark the pool as read-only, done! Unfortunately, it's not quite that simple...

There are three additional issues to solve:

mprotect needs memory to be a multiple of the page size.
mprotect needs the start address to be page aligned.
We don't know how much memory to allocate for the pool.

Problem 1 is simple enough, we just need to round up to a multiple of the page size (which can be conveniently retrieved with getpagesize).

size_t rounded;
size_t page_size;

page_size = (size_t)getpagesize();
rounded = (((((_num) + ((page_size) - 1))) / (page_size)) * (page_size));

Problem 2 it turns out is also pretty easy. If we allocate a single byte within the pool we can predict where the first 'real' allocation will occur. We can also subtract the pool's address from the allocation's address to figure out how much memory talloc is going to use for the chunk headers.

With this information, we can (if needed), perform a second allocation to pad the pool memory to the next page, ensuring 'real' allocations occur within the protected region. We can then return the address of the next page for use in mprotect. The only slight issue here, is we need to over-allocate the pool by one page to ensure there's enough memory.

Problem 3 is annoying, and solutions are, unfortunately, application specific. If there are no side effects from performing all the instantiation in case 1), and the amount of memory used is consistent, a two-pass approach could be used to figure out how much memory to allocate to the pool. Pass 1 would use talloc_init to get the top level chunk and talloc_total_size to reveal how much memory was in use, pass 2 would allocate a pool of an appropriate size.

For our specific use-case, we just allow the user to determine the pool size. This is because we're using protected memory as a debugging feature, so the user is also a developer, and allocating 1G of memory to ensure there's enough for the configuration is not a problem.

So what does all this look like? Well here's the function I came up with:

/** Return a page aligned talloc memory pool
 *
 * Because we can't intercept talloc's malloc() calls, we need to do some tricks
 * in order to get the first allocation in the pool page aligned, and to limit
 * the size of the pool to a multiple of the page size.
 *
 * The reason for wanting a page aligned talloc pool, is it allows us to
 * mprotect() the pages that belong to the pool.
 *
 * Talloc chunks appear to be allocated within the protected region, so this should
 * catch frees too.
 *
 * @param[in] ctx   to allocate pool memory in.
 * @param[out] start    A page aligned address within the pool.  This can be passed
 *          to mprotect().
 * @param[out] end  of the pages that should be protected.
 * @param[in] size  How big to make the pool.  Will be corrected to a multiple
 *          of the page size.  The actual pool size will be size
 *          rounded to a multiple of the (page_size), + page_size
 */
TALLOC_CTX *talloc_page_aligned_pool(TALLOC_CTX *ctx, void **start, void **end, size_t size)
{
    size_t      rounded, page_size = (size_t)getpagesize();
    size_t      hdr_size, pool_size;
    void        *next, *chunk;
    TALLOC_CTX  *pool;

#define ROUND_UP(_num, _mul) (((((_num) + ((_mul) - 1))) / (_mul)) * (_mul))

    rounded = ROUND_UP(size, page_size);            /* Round up to a multiple of the page size */
    if (rounded == 0) rounded = page_size;

    pool_size = rounded + page_size;
    pool = talloc_pool(ctx, pool_size);         /* Over allocate */
    if (!pool) return NULL;

    chunk = talloc_size(pool, 1);               /* Get the starting address */
    assert((chunk > pool) && ((uintptr_t)chunk < ((uintptr_t)pool + rounded)));
    hdr_size = (uintptr_t)chunk - (uintptr_t)pool;

    next = (void *)ROUND_UP((uintptr_t)chunk, page_size);   /* Round up address to the next page */

    /*
     *  Depending on how talloc allocates the chunk headers,
     *  the memory allocated here might not align to a page
     *  boundary, but that's ok, we just need future allocations
     *  to occur on or after 'next'.
     */
    if (((uintptr_t)next - (uintptr_t)chunk) > 0) {
        size_t  pad_size;
        void    *padding;

        pad_size = ((uintptr_t)next - (uintptr_t)chunk);
        if (pad_size > hdr_size) {
            pad_size -= hdr_size;           /* Save ~111 bytes by not over-padding */
        } else {
            pad_size = 1;
        }

        padding = talloc_size(pool, pad_size);
        assert(((uintptr_t)padding + (uintptr_t)pad_size) >= (uintptr_t)next);
    }

    *start = next;                      /* This is the address we feed into mprotect */
    *end = (void *)((uintptr_t)next + (uintptr_t)rounded);

    talloc_set_memlimit(pool, pool_size);           /* Don't allow allocations outside of the pool */

    return pool;
}

The above also uses talloc_set_memlimit to ensure no allocations can occur outside of the contiguous region.

TALLOC_CTX *global_ctx;
size_t      pool_size = 1024;
void        *pool_page_start = NULL, *pool_page_end = NULL;

global_ctx = talloc_page_aligned_pool(talloc_autofree_context(), &pool_page_start, &pool_page_end, pool_size);

/* Allocate things in global_ctx */

...

/* Done allocating/writing - protect */

if (mprotect(pool_page_start, (uintptr_t)pool_page_end - (uintptr_t)pool_page_start, PROT_READ) < 0) {
    exit(1);
}

/* Process requests */

...

/* Done processing - unprotect (so we can free) */

mprotect(pool_page_start, (uintptr_t)pool_page_end - (uintptr_t)pool_page_start,
         PROT_READ | PROT_WRITE);

When there is an errant write into protected memory on macOS you'll see a SEGV, and if running under lldb, you'll get a full backtrace showing exactly where the bad write was.