Search code examples
linuxopenmpnumanumactl

numactl and move_pages mismatch


I have developed a simple program to test in which NUMA node a page is, based on this question.

The problem is that comparing my program results with numactl -H on a Xeon E5-2698 v4 (two NUMA nodes) shows different outputs. numactl -H shows(cropped):

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

So, for example, numactl says that cpu 20 is at node 1. I have the following code:

#include <unistd.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <numaif.h>
#include <omp.h>

int numa_node(void *ptr) {
  int status;
  int ret_code;
  if((ret_code = move_pages(0, 1, &ptr, NULL, &status, 0)) == -1) {
    perror("move_pages");
    return -1;
  }
  return status;
}

int main(int argc, char* argv[]) {
  int pgsize = getpagesize();
  printf("NUMA test(pgsize=%d)\n",pgsize);
  #pragma omp parallel firstprivate(pgsize)
  {
    if(omp_get_thread_num() == 20) {
      char *m = aligned_alloc(pgsize, pgsize);
      m[0] = 'a';
      if(mlock(m, 10) == -1) {
        perror("mlock");
      }
      else {
        int node = numa_node(m);
        printf("thread %d: node %d\n",20,node);
      }
    }
  }
}

I'm using aligned_alloc trying to allocate just a page, aligned, so that when this thread "touches" this page, it will be mapped to the NUMA node where this thread is (first touch policy). Then I use mlock, which you can check in this question. I suppose I'm using first touch since I modified nothing related to this, however, I don't know how to check this, to make sure.

I'm compiling this with icc -fopenmp -lnumaand running with KMP_AFFINITY=granularity=fine,compact,OMP_NUM_THREADS=80 and numactl -m 0,1 ./numa. I'm ussing this affinity since I think it does the same assignation as numactl sees the system. This outputs:

NUMA test(pgsize=4096)
thread 20: node 0

So, this program says thread 20 is at node 0 but numactl says thread 20 is at node 1. Why? I was expecting to see the same output on both.


Solution

  • The requested affinity setting compact will place consecutive threads to neighboring hardware threads (Hyperthreading) on the same core. The numbering of the OS cpus numbers the additional hardware threads later, so cpu 0 and 40 are on the same core. The mapping will be as follows:

    tid -> cpu
    0 -> 0
    1 -> 40
    2 -> 1
    3 -> 41
    ...
    20 -> 10
    

    You can see that by adding ,verbose to KMP_AFFINITY. If you want a direct mapping, you can use GOMP_CPU_AFFINITY=0-79 instead of the KMP_AFFINITY settings. That should do the trick to get the memory on the right NUMA node.