On ARMv7, which is Thumb capable, is it right that we can avoid all the veneers by using the BX
instruction?
Since this instruction takes a 32 bit register, are we good?
If yes, when I see veneers in the generated code, I should specialize the output for my machine, right?
Thanks
Yes, since BX
takes a 32-bit register, there's no need for veeners because you can cover the whole addressing space.
Of course you'd need to load a 32-bit value into the register, which usually means constant pooling, so if you are looking to squeeze every cycle out of it and your program is not too large you're better off with relative branches. As @Notlikethat notes, if you don't already have the address in a register there's no point in using BX
when you can just LDR PC, ...
(unless you need to support ARMv4T interworking).
Relative, non-conditional, 32-bit Thumb branches have a 24-bit addressing space, so you can reach +/- 16MB (for others see here). If you're doing ELF, be really careful with 16-bit relative Thumb branches. A 32-bit branch will generate a 24-bit relocation and the linker will insert a veener if the target can't be addressed with 24 bits. A 16-bit branch generates a 11-bit relocation and ELF for ARM specifies that the linker is not required to generate veeners for those, so you'd risk a link-time out-of-range branch.