Search code examples

numa, mbind, segfault

I have allocated memory using valloc, let's say array A of [15*sizeof(double)]. Now I divided it into three pieces and I want to bind each piece (of length 5) into three NUMA nodes (let's say 0,1, and 2). Currently, I am doing the following:

double* A=(double*)valloc(15*sizeof(double));




First question is am I doing it right? I.e. is there any problems with being properly aligned to page size for example? Currently with size of 15 for array A it runs fine, but if I reset the array size to something like 6156000 and piece=2052000, and subsequently three calls to mbind start with &A[0], &A[2052000], and &A[4104000] then I am getting a segmentation fault (and sometimes it just hangs there). Why it runs for small size fine but for larger gives me segfault? Thanks.


  • For this to work, you need to deal with chunks of memory that are at least page-size and page-aligned - that means 4KB in most systems. In your case, I suspect the page gets moved twice (possibly three times), due to you calling mbind() three times over.

    The way numa memory is located is that CPU socket 0 has a range of 0..X-1 MB, socket 1 has X..2X-1, socket three has 2X-3X-1, etc. Of course, if you stick a 4GB stick of ram next to socket 0 and a 16GB in the socket 1, then the distribution isn't even. But the principle still stands that a large chunk of memory is allocated for each socket, in accordance to where the memory is actually located.

    As a consequence of how the memory is located, the physical location of the memory you are using will have to be placed in the linear (virtual) address space by page-mapping.

    So, for large "chunks" of memory, it is fine to move it around, but for small chunks, it won't work quite right - you certainly can't "split" a page into something that is affine to two different CPU sockets.


    To split an array, you first need to find the page-aligned size.

    page_size = sysconf(_SC_PAGESIZE);
    objs_per_page = page_size / sizeof(A[0]); 
    // We should be an even number of "objects" per page. This checks that that 
    // no object straddles a page-boundary
    ASSERT(page_size % sizeof(A[0]));   
    split_three = SIZE / 3; 
    aligned_size = (split_three / objs_per_page) * objs_per_page;
    remnant = SIZE - (aligned_size * 3);
    piece = aligned_size;
    mbind(&A[aligned_size*2 + remnant],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

    Obviously, you will now need to split the three threads similarly using the aligned size and remnant as needed.