Search code examples
clinuxprofilingperf

perf_event_open - limit when monitoring multiple events


Does anyone know if there is a limit to the number of PERF_TYPE_HARDWARE events that we can monitor in a single group PERF_FORMAT_GROUP?

I am attempting to monitor multiple events and I am finding that I am able to monitor 5 events, but when I add a 6th hardware event the values of all registered events do not get updated.

struct read_format {
  uint64_t nr;          /* The number of events */
  struct {
    uint64_t value;     /* The value of the event */
    uint64_t id;        /* if PERF_FORMAT_ID */
  } values[nr];
};

int main() {
  struct perf_event_attr attr1;
  attr1.type = PERF_TYPE_HARDWARE;
  attr1.config = PERF_COUNT_HW_CPU_CYCLES;
  attr1.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID;
  int main_fd = syscall(__NR_perf_event_open, &attr1, 0, -1, -1, 0);
  uint64_t id1;
  ioctl(main_fd, PERF_EVENT_IOC_ID, &id1);
  ioctl(main_fd, PERF_EVENT_IOC_RESET, 0);
  ioctl(main_fd, PERF_EVENT_IOC_ENABLE, 0);

  struct perf_event_attr attr2;
  attr2.type = PERF_TYPE_HARDWARE;
  attr2.config = PERF_COUNT_HW_CACHE_REFERENCES;
  attr2.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID;
  int fd2 = syscall(__NR_perf_event_open, &attr2, 0, -1, main_fd, 0);
  uint64_t id2;
  ioctl(fd2, PERF_EVENT_IOC_ID, &id2);
  ioctl(fd2, PERF_EVENT_IOC_RESET, 0);
  ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);

  /*
  commenting out attr3 through attr 7. They are the same as attr2 except the following config:
  attr3.config = PERF_COUNT_HW_CACHE_MISSES;
  attr4.config = PERF_COUNT_HW_BRANCH_MISSES;
  attr5.config = PERF_COUNT_HW_BUS_CYCLES;
  attr6.config = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND;
  attr7.config = PERF_COUNT_HW_STALLED_CYCLES_BACKEND;
  */

  // read_values and log "START"

  // action

  // read_values and log "END"

  return 0;
}

read_values() {
  char buffer[4096];
  int read_bytes = read(main_fd, &buffer, sizeof(buffer));
  if (read_bytes == -1) { return 1; }

  struct read_format* rf = (struct read_format*) buffer;
  int values[rf->nr];
  for (int i=0; i<rf->nr; i++) {
    values[i] = rf->values[i].value;
  }
}

In the above code, all logged values at "START" and "END" have been updated when I only open perf events for attr1 - 5. However, when I attempt to open perf events for 6 events (or all 7 hardware events), all logged values at "START" and "END" remain the exact same.

I am able to get values of all 7 events if I read each one directly: int fd2 = syscall(__NR_perf_event_open, &attr2, 0, -1, -1 /*!!instead of main_fd!!*/, 0);, then performing a read() on each fd. But to reduce the number of read calls I'd prefer to read from the main_fd and grab the values from there. Is there a limit to the number of hardware events that can be captured in a single PERF_FORMAT_GROUP? I've noticed if the 6th and 7th events are software events that I don't see this issue.


Solution

  • Figured I'd post an answer to my own question as I went down the rabbit hole and came across this post. Per the "Event Groups" section:

    The number of available performance counters depend on the CPU. A group cannot contain more events than available counters. For example Intel Core CPUs typically have four generic performance counters for the core, plus three fixed counters for instructions, cycles and ref-cycles. Some special events have restrictions on which counter they can schedule, and may not support multiple instances in a single group. When too many events are specified in the group some of them will not be measured.