If you look at documentation of operations like cmp
, test
, add
, sub
, and and
, you will notice that operations that involve register EAX
and its 16 and 8 bit variants as the first operand have a distinct opcode which is different from the "general case" version of these instructions.
Is this separate opcode merely a way to save code space, is it at all more efficient than the general-case opcode, or is it just some relic of the past that isn't worth shaking off for compatibility reasons?
This is primarily a relic of the past, but not exactly "obsolete" either.
In the early days (i.e., on the Intel 8088), the x86 register set was actually much more specialized, like other contemporary CISC processors. (The 8088's design was itself descended directly from the Intel 8080 and Zilog Z80 processors.) That is to say, the 8 registers were not all general-purpose like they (functionally) are today. There were many instructions that worked only on hard-coded registers. This meant that programmers frequently found themselves shuffling values back and forth among registers to get things set up correctly for the next instruction.
EAX was a particularly special register. Well, actually, back in those days, it was known as AX, since it was only 16 bits and hadn't yet been Extended to 32 bits. AX is the Accumulator, and was used as a hard-coded destination by lots of different instructions. An accumulator is the register where intermediate results are stored—it "accumulates" the results of logical and arithmetic operations. Nearly all early microprocessors had an accumulator register, and many of them forced you to use the accumulator in this way. The x86 architecture was in many cases more flexible, but it was still inspired by that design. A detailed write-up about the logic behind the x86 register set is here.
These special variants of the common instructions are a consequence of that design. They are short (only 1 byte), fast (mostly because of the small instruction size, but presumably there were also optimizations at the silicon level back in the early days, too) ways of interacting with values in the accumulator register.
So yes, it is precisely a way to save code space, and yes it is still more efficient than the general-case encodings precisely because fewer bytes are required to encode the instructions. Small code size is not as important today as it was back with the 8088, of course, given our significantly larger instruction caches and faster memory read speeds, but it still makes a difference. Any good x86 assembly programmer knows to prefer to use these short accumulator-based instructions whenever possible, and many compilers do too. It is especially important in inner loops, where keeping code size down is critical to ensure everything stays in the cache. Register usage and even code flow is often carefully reevaluated and rearranged to keep as many things in the accumulator as possible—even today—precisely so that these short, efficient opcodes can be used.
See also: Peter Cordes's excellent "Tips for golfing in x86/x64 machine code", which has more specific details about short-form encodings.