Search code examples
armarmv7perf

Event counters in ARM Cortex-A7


How many event counters supported by ARM Cortex-A7 and how can I select/read/write these counters?

For example if run:

./perf stat -e L1-dcache-loads,branch-loads sleep 1

where it stores events count?

Here you can see, {c9,c13,0} represent cycle count register and {c9,c13,2} represent event count register, so after executing perf command which register value will change c9 or c13?

If you see this code below:

static inline int armv7_pmnc_select_counter(int idx)
{
        u32 counter = ARMV7_IDX_TO_COUNTER(idx);
        asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (counter));
        return idx;
}

static inline void armv7pmu_write_counter(struct perf_event *event, u32 value)
{
        struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
        struct hw_perf_event *hwc = &event->hw;
        int idx = hwc->idx;
        if (!armv7_pmnc_counter_valid(cpu_pmu, idx))
                pr_err("CPU%u writing wrong counter %d\n",smp_processor_id(), idx);
        else if (idx == ARMV7_IDX_CYCLE_COUNTER)
                asm volatile("mcr p15, 0, %0, c9, c13, 0" : : "r" (value));
        else if (armv7_pmnc_select_counter(idx) == idx)
                asm volatile("mcr p15, 0, %0, c9, c13, 2" : : "r" (value));
}

For each event counter, the armv7pmu_write_counter function sets a different idx value with armv7_pmnc_select_counter but to update value, it is calling the same mcr instruction, how?


Solution

  • Because the second is a data register, which gives access to read and write a counter value, while the first is an index register, which selects which actual counter that data register is operating on.

    The typical reason to have such a setup is so that different implementations can provide different numbers of registers without changing the overall register map. In the case of ARMv7 PMUs, it isn't a great use of the relatively limited system register encoding space to have 32 count registers and 32 event type registers, most of which will be unimplemented, and you certainly wouldn't want registers to move around depending on how many counters this particular CPU implements.

    If it helps, imagine something like this:

    class PMU {
    private:
        int sel;
        int counter[NUMBER];
    
    public:
        int  num_counters(void)    { return NUMBER; };
    
        void select_counter(int i) { sel = i % NUMBER; };
    
        void write_counter(int d)  { counter[sel] = d; };
        int  read_counter(void)    { return counter[sel]; };
    }