Search code examples
cmacosinstrumentsmemset

OS X memset and system trace


Here is a simplified program:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <string.h>

void *worker(void *data) {
    size_t size = 1000000;
    void *area = malloc(size);
    if (area != NULL) {
        memset(area, 0, size);
        sleep(1);
        free(area);
    }
    return NULL;
}

int main() {
    int number_of_threads = 4;
    pthread_t threads[number_of_threads];

    for (int i = 0; i < number_of_threads; i++) {
        if (pthread_create(&(threads[i]), NULL, worker, NULL)) {
            return 0;
        }
    }

    for (int i = 0; i < number_of_threads; i++) {
        pthread_join(threads[i], NULL);
    }

    return 0;
}

I get the following system trace with the command iprofiler -systemtrace OSXMalloc:

enter image description here

Why does memset produce all this Zero Fill events? What do they mean and why so many? I understand that I try to fill 1 MB with zero but why it doesn't do this in one single call for each thread?


Solution

  • For security and privacy purposes, the kernel needs to guarantee that pages that are newly allocated to a process are filled with zeroes. Otherwise, you could get data from some other process, including, for example, passwords or financial information.

    The pages are zeroed on first access, sort of similar to copy-on-write. Since memset() will iterate through the pages zeroing them out, the kernel will zero-fill the pages one at a time. memset() then does a bunch of redundant work writing zeroes on already-zeroed pages.

    You would be better served by using calloc() rather than malloc() followed by memset(..., 0, ...). Since the malloc library knows that the kernel will zero-fill freshly-allocated pages, it knows it doesn't need to do an explicit memset() to satisfy the zero-filling contract of calloc(). There will still be the zero-fill faults at first access, but they will happen when the memory is really being used for the first time. They won't be done "eagerly" for an unneeded memset().

    By the way, not all allocations done through malloc() get new pages from the kernel. Some will reuse pages previously allocated and freed within your process. However, for large allocations like you're doing, the pages are typically allocated during malloc() and deallocated during free().