Search code examples
performancestm32

STM32H7 performance


I would appreciate a brief explanation of how my assembler timing loop on a NUCLEO-H723ZG board indicates that it is being executed in a single cpu clock cycle. The two instructions used, a SUBS and a BNE, consume three clock cycles when the loop is branching so there is some magic afoot! I am using the GPIO BSRR to toggle a LED and need to use a timing loop count of 275M to achieve an approximate one flash per second.


Solution

  • For the Cortex M0, M3 and M4 the cycle counts are included in the technical reference manual (eg Cortex M4). For the M7 they are not published, but it sounds like you have measured the answer for yourself so do not need it to be in the manual in this case.

    If your code is correct, then the processor is able do those two instructions in a single cycle.

    This is not surprising. For example the M4 can carry out a 16-bit data processing instruction and it instruction in a single cycle.

    You can disable this if you require deterministic (but worse) performance. See the DISFOLD bit in the Auxiliary Control Register.