Search code examples
armperformancecounterperfarmv7chromebook

ARMv7 instructions to access performance counters directly from assembly language


I want to collects performance counter information e.g. branch mispredictions from ARMv7 processors. I'm working on a project where I need to be able to count mispredictions but the device I'm using (my Chromebook) doesn't have the right Linux kernel to use the 'perf' utility. I can't install a Linux that has support for perf on this old machine. The projects that would have supported that seems to be unavailable on the Internet since they are so old. I know there is a way to use ARMv7 instructions to access performance counters directly from assembly language but I'm not sure how to do it.


Solution

  • There are some examples of direct PMU performance counters usage on ARM, for example

    armv7: http://neocontra.blogspot.com/2013/05/user-mode-performance-counters-for.html

    armv8: http://zhiyisun.github.io/2016/03/02/How-to-Use-Performance-Monitor-Unit-(PMU)-of-64-bit-ARMv8-A-in-Linux.html

    So the first thing is to create a kernel module to enable user-mode access to PMU counters. Below is the code to set PMU register PMUSERENR_EL0 to enable user-mode access.

    /*Enable user-mode access to counters. */
    asm volatile("msr pmuserenr_el0, %0" : : "r"((u64)ARMV8_PMUSERENR_EN_EL0|ARMV8_PMUSERENR_ER|ARMV8_PMUSERENR_CR));
    
    /*   Performance Monitors Count Enable Set register bit 30:0 disable, 31 enable. Can also enable other event counters here. */ 
    asm volatile("msr pmcntenset_el0, %0" : : "r" (ARMV8_PMCNTENSET_EL0_ENABLE));
    
    /* Enable counters */
    u64 val=0;
    asm volatile("mrs %0, pmcr_el0" : "=r" (val));
    asm volatile("msr pmcr_el0, %0" : : "r" (val|ARMV8_PMCR_E));
    

    But performance counters are privileged part of system, by default they are only accessible from kernel mode. You can't just use assembly instructions in user space code to use them, and only result you will get is SIGSEGV or other variant of permission denied. To enable access from user-space, some work should be done in kernel mode. It can be any of existing PMU driver: perf or oprofile (older pmu access tool), or it can be some custom kernel module which will enable user-space access to PMU registers. But to compile your module you still need most of kernel development infrastructure for your kernel (I expect that standard chromebook kernel has no kernel includes "kbuild" to do module build, and this kernel may not accept unsigned modules in standard configuration).

    What can you do:

    • Use another machine, something more recent than your outdated chromebook. Your project may have some machines in remote access, you can try to buy some small and popular ARM single-board computer with linux (like raspberry pi 3/4). That popular board will have more recent arm cpu core, and it will have ubuntu or debian
    • Check oprofile subsystem, it may be enabled in your kernel. Oprofile tools are older than perf but can access PMU counters too.
    • Recompile linux kernel with perf_events subsystem enabled. You need only correct kernel which will boot on your chromebook, and any compiler to rebuild perf out-of-tree from https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/ (use any version of perf). Or use perf_event_open syscall directly
    • Check for /lib/modules/`uname -r`/build directory. If it exists, you can try to build custom kernel module to enable user-space direct access

    TRM on pmcr_el0 and other PMU registers: https://developer.arm.com/documentation/100442/0100/debug-registers/aarch64-pmu-registers/pmcr-el0--performance-monitors-control-register--el0 https://developer.arm.com/docs/ddi0595/h/aarch64-system-registers/pmcr_el0 https://developer.arm.com/docs/ddi0595/h/aarch32-system-registers/pmccntr https://developer.arm.com/documentation/ddi0535/c/performance-monitoring-unit and some overview https://people.inf.ethz.ch/markusp/teaching/263-2300-ETH-spring14/slides/08-perfcounters.pdf