Search code examples
assemblyoptimizationx86x86-64micro-optimization

`test` vs `cmp` for one-bit registers comparison


In x86, when I have two registers, and I know both of them have only one bit turned on, and I want to know whether they're equal, I can use either test or cmp (cmp a, b will give zero when they're equal, test a, b will give zero when they're not equal).

Questions like In x86 what's difference between "test eax,eax" and "cmp eax,0" or Test whether a register is zero with CMP reg,0 vs OR reg,reg? say that when comparing to zero it is preferred to use test over cmp. Does this advice stay when comparing two registers? Or perhaps the fact that one needs zero and the other needs not-zero affects somehow?

I'm mainly interested in 64-bit registers comparison with 64 bits processor, but if there's a difference with 32 bits I would like to hear too. Mostly important are latest Alder Lake and Zen 3, but other processors can be interesting too.


Solution

  • In the scenario you described, both instructions perform identically on recent microarchitectures. On Alder Lake P, both can run on ports 0, 1, 5, 6, and 11 with a reciprocal throughput of 0.2 (0.25 and slightly less ports on Alder Lake E), while on Zen 3, both run on 4 ports with a reciprocal throughput of 0.25. The latency is 1 in both cases.

    As for macro fusion, both instructions fuse with je and jne, which is the one you are interested in.

    So really, in this case in particular it does not make a difference. There may be a difference in other use cases, e.g. when immediates or other conditions are involved.